[ http://issues.apache.org/jira/browse/NUTCH-192?page=comments#action_12364699 ]
Stefan Groschupf commented on NUTCH-192: ---------------------------------------- * plus whatever it takes to put the class name->id mapping in the MapWritable header (the mapping table): let's assume 40 bytes. I do not write the mapping table in any kind to the out stream, by now the the id is caculated by a hash from the class name. I will change this so it will be a part of the class where I will manually assign LongWritable id = (byte)1, UTF8 id = (byte)2, etc. For example writing a long ( e.g. a timestamp) as UTF8 require me 15 byte, writing it as LongWritable took me 8 byte. 8 byte plus 1 byte for the class type, is 60 % required space than using a String. I guess the main missunderstanding is that I do not write the clazz - id map into the stream at any time. Makes that sense? > meta data support for CrawlDatum > -------------------------------- > > Key: NUTCH-192 > URL: http://issues.apache.org/jira/browse/NUTCH-192 > Project: Nutch > Type: Improvement > Versions: 0.8-dev > Reporter: Stefan Groschupf > Fix For: 0.8-dev > Attachments: metadata300106.patch > > Supporting meta data in CrawlDatum would help to get a set of new nutch > features realized and makes a lot possible to smaller special focused search > engines. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
