Hi dear nutchers,

I have implemented http session support for nutch. A patch will be
released, as soon as i switched to mapreduce.
I am crawling an intranet CMS. I was succesfull in indexing the PDFs.
If I follow the link in the search result pane, the PDFs are not retrieved
by the clients browser, because a session cookie is not set. I need some
kind of metadata in the PDF refering to the original HTML-URL, were this
session cookie is set before the page is redirekted to the url of the PDF.
This information is only availible when this HTML-URL is parsed.

Any ideas?

Thanks for your help.

Marcel Schnippe

Reply via email to