Evert Lammerts at Sara.nl did something seemed to your problem, spliting
a big 2.7 TB file to chunks of 10 GB.
This work was presented on the BioAssist Programmers' Day on January of
this year and its name was
"Large-Scale Data Storage and Processing for Scientist in The Netherlands"
http://www.slideshare.net/evertlammerts
P.D: I sent the message with a copy to him
El 6/20/2011 10:38 AM, Niels Basjes escribió:
Hi,
On Mon, Jun 20, 2011 at 16:13, Mapred Learn<mapred.le...@gmail.com> wrote:
But this file is a gzipped text file. In this case, it will only go to 1 mapper
than the case if it was
split into 60 1 GB files which will make map-red job finish earlier than one 60
GB file as it will
Hv 60 mappers running in parallel. Isn't it so ?
Yes, that is very true.
--
Marcos Luís Ortíz Valmaseda
Software Engineer (UCI)
http://marcosluis2186.posterous.com
http://twitter.com/marcosluis2186