[
https://issues.apache.org/jira/browse/PIG-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rohini Palaniswamy updated PIG-3204:
------------------------------------
Attachment: PIG-3204-1.patch
The problem was that when a script is parsed, there was a registerQuery done
for each line by GruntParser which did a parse for that line and the previous
lines from the script cache. So if there was a LOAD statement followed by 10
other statements, then the LOAD will be parsed 11 times in registerQuery (which
made the calls to getSchema). Added an option to skip that parsing if it is in
batch mode. executeBatch() will take care of the parsing.
This will speedup pig script parsing a lot.
> Optimize the number of FS calls to get schema to cut down time before job
> launch
> --------------------------------------------------------------------------------
>
> Key: PIG-3204
> URL: https://issues.apache.org/jira/browse/PIG-3204
> Project: Pig
> Issue Type: Improvement
> Affects Versions: 0.10.1
> Reporter: Rohini Palaniswamy
> Attachments: PIG-3204-1.patch
>
>
> Currently there are a lot of NN calls made to determine if there is a
> schema file for a path in a LOAD statement. When there is a slow NN(caused by
> whole bunch of other issues), it takes a lot of time for this and we found
> the scripts spending anywhere from 5 mins to 40 mins depending upon the
> script. It seems to be a good place for optimization.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira