Re: Tika HTML parsing

2010-08-15 Thread Andrzej Bialecki
On 2010-08-15 06:54, Ken Krugler wrote: For what it's worth, I just committed some patches to Tika that should improve Tika's ability to extract HTML outlinks (in img and frame elements, at least). Support for iframe should be coming soon :) This is in 0.8-SNAPSHOT, and there's one troubling

[jira] Commented: (NUTCH-887) Delegate parsing of feeds to Tika

2010-08-15 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12898706#action_12898706 ] Chris A. Mattmann commented on NUTCH-887: - bq. Ah, good - I missed that, I need to