[ https://issues.apache.org/jira/browse/HIVE-1357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Carl Steinbach updated HIVE-1357: --------------------------------- Fix Version/s: 0.6.0 Component/s: Query Processor Serializers/Deserializers > CombineHiveInputSplit should initialize the inputFileFormat once for a single > split > ----------------------------------------------------------------------------------- > > Key: HIVE-1357 > URL: https://issues.apache.org/jira/browse/HIVE-1357 > Project: Hadoop Hive > Issue Type: Improvement > Components: Query Processor, Serializers/Deserializers > Reporter: Ning Zhang > Assignee: Ning Zhang > Fix For: 0.6.0 > > Attachments: HIVE-1357.patch > > > If a split consists of multiple files, the FileFormat should always be the > same, whether RCFile or SequenceFile. Currently the CombineHiveInputSplit > tries to get the inputFileFormat for each new file in the split, which is > O(n) where n is the number of files in the split. This is an O(n^2) operation > and degrade the performance badly for combining large number of small files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.