Doug Cutting wrote:
Andrzej Bialecki wrote:
Hmm ... the idea was to avoid the cost of additional I/O, and read the
parts directly as they are. If I understand it correctly, the
Sorter.merge() needs to rewrite the files in order to merge them,
which means a lot of I/O.
It only rewrites things if there are more parts than the mergefactor. So
if you increase the mergefactor to the number of parts, then no data
will be written.
This is non-intuitive. I'll run some tests, and if it works as
advertised ;) then I think it would be nice to add a wrapper API to
MapFileOutputFormat that uses this method.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com