Nutch should delegate compression to Hadoop
-------------------------------------------
Key: NUTCH-506
URL: https://issues.apache.org/jira/browse/NUTCH-506
Project: Nutch
Issue Type: Improvement
Reporter: Doğacan Güney
Fix For: 1.0.0
Some data structures within nutch (such as Content, ParseText) handle their own
compression. We should delegate all compressions to Hadoop.
Also, nutch should respect io.seqfile.compression.type setting. Currently even
if io.seqfile.compression.type is BLOCK or RECORD, nutch overrides it for some
structures and sets it to NONE (However, IMO, ParseText should always be
compressed as RECORD because of performance reasons).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.