Is there a way to tell Hive to take multiple input files as input for a
single map task.

Task setup time is so high in Hive/Hadoop that it really degrades
performance when there are many smaller files (10mb range). But there's
no reason why 10 different smaller files shouldn't be sent to the same
map task, the question is: does Hive support this scenario?

If yes, how to set it up?


-- 
Andraz Tori, CTO
Zemanta Ltd, New York, London, Ljubljana
www.zemanta.com
mail: [email protected]
tel: +386 41 515 767
twitter: andraz, skype: minmax_test



Reply via email to