Re: where we need meta data?

Andrzej Bialecki Sun, 29 Jan 2006 23:57:02 -0800

Stefan Groschupf wrote:

Hi,
some thoughts about meta data.
We agree that we try to minimize the usage of meta data, to keepperformance high.Since we descide to have meta data separated, I was thinking of a metadata db as we have a crawl db today.
I asking my self where we will need meta data, so it makes sense tohave them separated or not.
My personal list:

[...]

As you point out, in many cases the additional metadata is neededthroughout most of the workflow. So, it would make more sense to keep ittogether with CrawlDatum.

+ generation // having meta data here to decide if a page should befetched or not+ fetching // here I'm not sure, my we need meta data for fecthing butit may be would be great to store session or authenticationinformations can be used until fetching.

Yes, that's a perfect example. Also, last modification time is requiredto detect modified content.

However until fetching and parsing meta data for  a url can be created.
+ updating // until updating i was planing to overwrite the old metadata with the new data, I had the idea to use a system.currentmillisas a stored timestamp to identify the newer meta data, but I have noidea if the current millis are fast enough for the job, any thoughts?

Do we need versioning or timestamping of metadata? I can't imaginewhy... we already store the last fetch time.

+ indexing // to add url meta data into the index.
Well, looking to this list, I'm more and more believe that it would bea better idea to store the meta data into the CrawlDatum objectdirectly. It save a lot of code changes and we need meta dataeverywhere anyway.


[...]

So why not adding meta data directly to crawlDatum?


I thought it was already decided ;-) . Yes, we need to do just that.

--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: where we need meta data?

Reply via email to