Re: Nutch/Lucene unique ID for every item crawled?

Sagar Vibhute Sun, 21 Oct 2007 06:10:59 -0700

Hash value of the url does sound useful. Thanks! :-)

But well, is the segment ID different for every crawl? In which case the
segment ID + Doc Id can become a unique mapping. Trouble is, I don't know
how to extract the doc id of a particular document while it is being
crawled. I found a method which, given a doc Id gives the document, but
that's not what I need, I kinda need the opposite.


Any leads?

- Sagar


On 10/21/07, Sagar Naik <[EMAIL PROTECTED]> wrote:
>
> Hey,
> The lucene document id , an integer, may not be same for 2 different
> crawls.
> I am not sure if this is wht u r looking for but U can store a hash
> value of the url crawled ;)
>
> - Sagar
>
> Sagar Vibhute wrote:
> > Hello,
> >
> > Does nutch/lucene provide for a unique ID for every item that it has
> > crawled?
> >
> > I checked the Lucene docid but from what I understood, the lucene docid
> is
> > not unique for every item crawled. Is that so?
> >
> > How can I get this unique ID, if it is available?
> >
> > Thanks.
> >
> > - Sagar
> >
> >
>
>
> --
> This message has been scanned for viruses and
> dangerous content and is believed to be clean.
>
>

Re: Nutch/Lucene unique ID for every item crawled?

Reply via email to