[
https://issues.apache.org/jira/browse/ORC-430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16671918#comment-16671918
]
Xiening Dai commented on ORC-430:
---------------------------------
What do you mean by "combine"? One mapper can always open multiple input files
sequentially which is up to the splitting logic your mapper uses, although it's
not efficient to have really small orc files. We also have file merger which is
able to merge multiple small orc files into a big one.
> Support combining orc files in Hadoop streaming
> -----------------------------------------------
>
> Key: ORC-430
> URL: https://issues.apache.org/jira/browse/ORC-430
> Project: ORC
> Issue Type: Improvement
> Reporter: Yuanbo Liu
> Priority: Major
>
> In the case of huge number of orc files, there seems no CombineOrcFile class
> existing to decrease mappers. When we use hadoop streaming in such case,
> hadoop cluster will apply a lot of mappers. It would be great that we can
> combine a batch of orc files into one mapper.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)