Hi dear nutchers, I have implemented http session support for nutch. A patch will be released, as soon as i switched to mapreduce. I am crawling an intranet CMS. I was succesfull in indexing the PDFs. If I follow the link in the search result pane, the PDFs are not retrieved by the clients browser, because a session cookie is not set. I need some kind of metadata in the PDF refering to the original HTML-URL, were this session cookie is set before the page is redirekted to the url of the PDF. This information is only availible when this HTML-URL is parsed.
Any ideas? Thanks for your help. Marcel Schnippe