Sami Siren wrote:
Michael Wechner wrote:
Hi
It seems to me that Nutch 0.8.x cannot extract the title from an XHTML
page, e.g.
Try changing the following in your parse-plugins.xml
<mimeType name="application/xhtml+xml">
<plugin id="parse-html" />
</mimeType>
This was changed in trunk and it _should_ fix that problem.
thanks :-) this seems to work.
Shall I send a patch for nutch-0.8.x? Or is nutch 0.8.x unmaintained?
Cheers
Michi
--
Sami Siren
--
Michael Wechner
Wyona - Open Source Content Management - Apache Lenya
http://www.wyona.com http://lenya.apache.org
[EMAIL PROTECTED] [EMAIL PROTECTED]
+41 44 272 91 61