Hash value of the url does sound useful. Thanks! :-) But well, is the segment ID different for every crawl? In which case the segment ID + Doc Id can become a unique mapping. Trouble is, I don't know how to extract the doc id of a particular document while it is being crawled. I found a method which, given a doc Id gives the document, but that's not what I need, I kinda need the opposite.
Any leads? - Sagar On 10/21/07, Sagar Naik <[EMAIL PROTECTED]> wrote: > > Hey, > The lucene document id , an integer, may not be same for 2 different > crawls. > I am not sure if this is wht u r looking for but U can store a hash > value of the url crawled ;) > > - Sagar > > Sagar Vibhute wrote: > > Hello, > > > > Does nutch/lucene provide for a unique ID for every item that it has > > crawled? > > > > I checked the Lucene docid but from what I understood, the lucene docid > is > > not unique for every item crawled. Is that so? > > > > How can I get this unique ID, if it is available? > > > > Thanks. > > > > - Sagar > > > > > > > -- > This message has been scanned for viruses and > dangerous content and is believed to be clean. > >
