CombineHiveInputSplit should initialize the inputFileFormat once for a single 
split
-----------------------------------------------------------------------------------

                 Key: HIVE-1357
                 URL: https://issues.apache.org/jira/browse/HIVE-1357
             Project: Hadoop Hive
          Issue Type: Improvement
            Reporter: Ning Zhang
            Assignee: Ning Zhang


If a split consists of multiple files, the FileFormat should always be the 
same, whether RCFile or SequenceFile. Currently the CombineHiveInputSplit tries 
to get the inputFileFormat for each new file in the split, which is O(n) where 
n is the number of files in the split. This is an O(n^2) operation and degrade 
the performance badly for combining large number of small files. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to