Tika 0.8-SNAPSHOT and HTML torture testing

I just committed some changes to Tika that (in theory) should ensureall URLs get extracted from HTML documents.


See https://issues.apache.org/jira/browse/TIKA-463 for details.

It would be great if somebody active in Nutch could try this out withthe current suite of Nutch tests for HTML processing.


Thanks!

-- Ken

--------------------------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g

Reply via email to