On Mon, 6 Sep 2010, Ken Krugler wrote:
I recently updated the Bixo project to use Tika 0.8-SNAPSHOT, and a number of documents now fail during parsing that previously passed.
Any chance you could create a new jira issue, and upload one of the problem documents?
Did the Tika-0.7 image parsers (JPEG, GIF, PNG) not extract metadata, and thus not run into these types of issues?
The image metadata stuff has changed dramatically since 0.7, and we're now processing a lot more of the files in search of useful metadata than we used to.
Nick
