[ 
http://issues.apache.org/jira/browse/NUTCH-192?page=comments#action_12364694 ] 

Andrzej Bialecki  commented on NUTCH-192:
-----------------------------------------

What I meant was that both keys and values should be Strings (or rather UTF8), 
for the sake of simplicity. Let's take your example: if we use Writables, then 
to store 1 ByteWritable you need:

* 1 byte - type id
* 1 byte - value
* plus whatever it takes to put the class name->id mapping in the MapWritable 
header (the mapping table): let's assume 40 bytes.

For storing one value it's a substantial overhead. For storing hundreds of 
values the overhead is going down asymptotically to 1 byte per value.

So, the question really is what is the typical use scenario that we want to 
optimize: whether you intend to store hundreds of metadata values of different 
types, or just a couple. If the former, then using MapWritable makes sense, if 
the latter - using Strings is simpler.

> meta data support for CrawlDatum
> --------------------------------
>
>          Key: NUTCH-192
>          URL: http://issues.apache.org/jira/browse/NUTCH-192
>      Project: Nutch
>         Type: Improvement
>     Versions: 0.8-dev
>     Reporter: Stefan Groschupf
>      Fix For: 0.8-dev
>  Attachments: metadata300106.patch
>
> Supporting meta data in CrawlDatum would help to get a set of new nutch 
> features realized and makes a lot possible to smaller special focused search 
> engines.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply via email to