[
http://issues.apache.org/jira/browse/NUTCH-192?page=comments#action_12364683 ]
Stefan Groschupf commented on NUTCH-192:
----------------------------------------
Andrzej, Doug. I'm not sure if I understand you correct, do you suggest to have
string keys and values, or just string keys?
It confuse me a bit but I'm afraid to misunderstand things because of my
english, since I remember that one reason to have no meta data until today was
performance and the size of data.
In one of my personal use-cases I have a set of meta data that is definitely
smaller than 255 and I only need to store some long values.
So I would love to use key:ByteWritable and value:LongWritable.
Storing new LongWritable(23) or new UTF8("23") should be a significant
different in size. Also parsing byte int or long from a string takes some time.
At least there is a nice side effect, since this map also is a writable we can
store a Map in a Map, what allows heretically meta data.
I fully agree with having a manual created mapping table stored in the
MapWritable class and I will change this and commit a new patch.
Thanks for your comments!
> meta data support for CrawlDatum
> --------------------------------
>
> Key: NUTCH-192
> URL: http://issues.apache.org/jira/browse/NUTCH-192
> Project: Nutch
> Type: Improvement
> Versions: 0.8-dev
> Reporter: Stefan Groschupf
> Fix For: 0.8-dev
> Attachments: metadata300106.patch
>
> Supporting meta data in CrawlDatum would help to get a set of new nutch
> features realized and makes a lot possible to smaller special focused search
> engines.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira