Doug Cutting wrote:
Andrzej Bialecki wrote:
Hmm ... the idea was to avoid the cost of additional I/O, and read the parts directly as they are. If I understand it correctly, the Sorter.merge() needs to rewrite the files in order to merge them, which means a lot of I/O.

It only rewrites things if there are more parts than the mergefactor. So if you increase the mergefactor to the number of parts, then no data will be written.

This is non-intuitive. I'll run some tests, and if it works as advertised ;) then I think it would be nice to add a wrapper API to MapFileOutputFormat that uses this method.


--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to