My current solution is having a modified Fetcher putting info in the Parse 
Metadata in the output method.

Then this info can be used during parsing and so on.
As Andrzej said, I also had to create my own OutputFormat.


-----Original Message-----
From: Andrzej Bialecki [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, August 30, 2006 12:59 AM
To: [email protected]
Subject: Re: get CrawlDatum

Uroš Gruber wrote:
> Hi,
>
> Could someone point me how to get CrawlDatum data from key url in 
> ParseOutputFormat.write [83].
> I would like to add data to link urls but this data depend on data of 
> url being crawled.

You can't, because that instance of CrawlDatum is not available at this place. 
Either you need to provide it on the input to the map/reduce job (but then you 
will have to change input and output formats), or you should prepare this 
information in advance during parsing, and put it into ParseData.metadata.

>
> I hope I was clear enough about my problem.
I hope so too ;)

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web ___|||__||  \|  ||  
|  Embedded Unix, System Integration http://www.sigram.com  Contact: info at 
sigram dot com


Reply via email to