[ http://issues.apache.org/jira/browse/NUTCH-192?page=comments#action_12364694 ]
Andrzej Bialecki commented on NUTCH-192: ----------------------------------------- What I meant was that both keys and values should be Strings (or rather UTF8), for the sake of simplicity. Let's take your example: if we use Writables, then to store 1 ByteWritable you need: * 1 byte - type id * 1 byte - value * plus whatever it takes to put the class name->id mapping in the MapWritable header (the mapping table): let's assume 40 bytes. For storing one value it's a substantial overhead. For storing hundreds of values the overhead is going down asymptotically to 1 byte per value. So, the question really is what is the typical use scenario that we want to optimize: whether you intend to store hundreds of metadata values of different types, or just a couple. If the former, then using MapWritable makes sense, if the latter - using Strings is simpler. > meta data support for CrawlDatum > -------------------------------- > > Key: NUTCH-192 > URL: http://issues.apache.org/jira/browse/NUTCH-192 > Project: Nutch > Type: Improvement > Versions: 0.8-dev > Reporter: Stefan Groschupf > Fix For: 0.8-dev > Attachments: metadata300106.patch > > Supporting meta data in CrawlDatum would help to get a set of new nutch > features realized and makes a lot possible to smaller special focused search > engines. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
