[jira] [Updated] (PIG-3204) Optimize the number of FS calls to get schema to cut down time before job launch

Rohini Palaniswamy (JIRA) Tue, 11 Jun 2013 17:03:34 -0700

     [ 
https://issues.apache.org/jira/browse/PIG-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Rohini Palaniswamy updated PIG-3204:
------------------------------------

    Attachment: PIG-3204-1.patch

 The problem was that when a script is parsed, there was a registerQuery done 
for each line by GruntParser which did a parse for that line and the previous 
lines from the script cache. So if there was a LOAD statement followed by 10 
other statements, then the LOAD will be parsed 11 times in registerQuery (which 
made the calls to getSchema). Added an option to skip that parsing if it is in 
batch mode. executeBatch() will take care of the parsing. 

 This will speedup pig script parsing a lot.
                
> Optimize the number of FS calls to get schema to cut down time before job 
> launch
> --------------------------------------------------------------------------------
>
>                 Key: PIG-3204
>                 URL: https://issues.apache.org/jira/browse/PIG-3204
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.10.1
>            Reporter: Rohini Palaniswamy
>         Attachments: PIG-3204-1.patch
>
>
>   Currently there are a lot of NN calls made to determine if there is a 
> schema file for a path in a LOAD statement. When there is a slow NN(caused by 
> whole bunch of other issues), it takes a lot of time for this and we found 
> the scripts spending anywhere from 5 mins to 40 mins depending upon the 
> script. It seems to be a good place for optimization. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-3204) Optimize the number of FS calls to get schema to cut down time before job launch

Reply via email to