Ted Dunning-3 wrote:
>
>
> The simple solution is to package files together in such a way that you
> can
> start anywhere in the package and process a number of files. Using this
> technique, you can pretty easily have a much smaller number of very large
> files. Moreover, because of the start-anywhere design, these large files
> can be processed efficiently in Hadoop because they will be near the
> programs processing them and because they will be read from disk in large
> sequential swathes.
>
>
When you talk about packaging lots of small files together before putting
them into HDFS what are you talking about? Something as simple as cat?
Thanks
--
View this message in context:
http://www.nabble.com/Using-Map-Reduce-without-HDFS--tf4331338.html#a12337733
Sent from the Hadoop Users mailing list archive at Nabble.com.