[ 
https://issues.apache.org/jira/browse/HIVE-1357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-1357:
---------------------------------

    Fix Version/s: 0.6.0
      Component/s: Query Processor
                   Serializers/Deserializers

> CombineHiveInputSplit should initialize the inputFileFormat once for a single 
> split
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-1357
>                 URL: https://issues.apache.org/jira/browse/HIVE-1357
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor, Serializers/Deserializers
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>             Fix For: 0.6.0
>
>         Attachments: HIVE-1357.patch
>
>
> If a split consists of multiple files, the FileFormat should always be the 
> same, whether RCFile or SequenceFile. Currently the CombineHiveInputSplit 
> tries to get the inputFileFormat for each new file in the split, which is 
> O(n) where n is the number of files in the split. This is an O(n^2) operation 
> and degrade the performance badly for combining large number of small files. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to