Andrzej Bialecki wrote:
Uroš Gruber wrote:
Hi,

Could someone point me how to get CrawlDatum data from key url in ParseOutputFormat.write [83]. I would like to add data to link urls but this data depend on data of url being crawled.

You can't, because that instance of CrawlDatum is not available at this place. Either you need to provide it on the input to the map/reduce job (but then you will have to change input and output formats), or you should prepare this information in advance during parsing, and put it into ParseData.metadata.
ParseData.metadata sounds nice, but I think I'm lost again :)
If I understand code flow the best place would be in Fetcher [262]

but i'm not sure that datum holds info of url being fetched


I hope I was clear enough about my problem.
I hope so too ;)



Reply via email to