My current solution is having a modified Fetcher putting info in the Parse Metadata in the output method.
Then this info can be used during parsing and so on. As Andrzej said, I also had to create my own OutputFormat. -----Original Message----- From: Andrzej Bialecki [mailto:[EMAIL PROTECTED] Sent: Wednesday, August 30, 2006 12:59 AM To: [email protected] Subject: Re: get CrawlDatum Uroš Gruber wrote: > Hi, > > Could someone point me how to get CrawlDatum data from key url in > ParseOutputFormat.write [83]. > I would like to add data to link urls but this data depend on data of > url being crawled. You can't, because that instance of CrawlDatum is not available at this place. Either you need to provide it on the input to the map/reduce job (but then you will have to change input and output formats), or you should prepare this information in advance during parsing, and put it into ParseData.metadata. > > I hope I was clear enough about my problem. I hope so too ;) -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
