Rohini Palaniswamy created PIG-3366:
---------------------------------------

             Summary: Do intelligent combination of input splits for compressed 
files. 
                 Key: PIG-3366
                 URL: https://issues.apache.org/jira/browse/PIG-3366
             Project: Pig
          Issue Type: Improvement
            Reporter: Rohini Palaniswamy


pig.maxCombinedSplitSize defaults to block size. If there are lot of small bz 
files which will uncompress to big data, they were combined till the block size 
was reached which was 128 MB in our case. The load took 20 mins, but using 
pig.noSplitCombination=true cut down the time to 2+mins. 

Need intelligent logic to take into account the factor the input split will 
expand to when uncompressed (factor will differ for different compression 
formats like bz and gz and can be configurable by user) and use the expanded 
size as an estimate while combining splits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to