Background: I'm trying to track the details of how Hive creates multi-file splits. I'm under the impression that MapReduce's CombineFileInputFormat does the main work of combining files and specifically that, if no overrides are set, then the target split filesize will be set to dfs.block.size.
However, I cannot see how the value for dfs.block.size finds its way into CombineFileInputFormat. I'm probably missing some obvious thing but I'd appreciate someone pointing it out! thanks, Mike.
