Is there a way to tell Hive to take multiple input files as input for a single map task.
Task setup time is so high in Hive/Hadoop that it really degrades performance when there are many smaller files (10mb range). But there's no reason why 10 different smaller files shouldn't be sent to the same map task, the question is: does Hive support this scenario? If yes, how to set it up? -- Andraz Tori, CTO Zemanta Ltd, New York, London, Ljubljana www.zemanta.com mail: [email protected] tel: +386 41 515 767 twitter: andraz, skype: minmax_test
