I just finished the against govdocs1 with 1.7 vs. 1.8-rc1, and all looks good with one major change... on first glance.
Because of my "fix" on TIKA-1519 and the law of unintended consequences, files that start like so: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> Have different Content-Type(s) between the In Tika 1.7, they used to have a Content-Type of: text/html; charset=iso-8859-1 In Tika 1.8-rc1, they now have a Content-Type of: application/xhtml+xml This is a major change. Do we want this? Or do we want to revert to the old behavior but add some kind of filter to prevent crazy Content-Type information like the following from overwriting what the detector detected: <meta http-equiv="Content-Type" content="application/pdf" /> or <meta http-equiv="Content-Type" content="anythingIFeelLikeInserting" /> -----Original Message----- From: David Meikle [mailto:[email protected]] Sent: Wednesday, April 08, 2015 8:06 PM To: [email protected] Subject: Re: [VOTE] Release Apache Tika 1.8 Candidate #1 Hey Tyler, > On 7 Apr 2015, at 19:54, Tyler Palsulich <[email protected]> wrote: > > [ ] +1 Release this package as Apache Tika 1.8 > [ ] -1 Do not release this package because... Whilst my testing with the release is good so far on Mac and Linux with Windows to go, and I am inclined to +1, it would be good if you were able to get your code signing key signed by someone nearby to avoid the warning below? amadeaus-air:release david$ gpg --verify tika-1.8-src.zip.asc gpg: Signature made Tue 7 Apr 19:45:15 2015 EDT using RSA key ID D4F10117 gpg: Good signature from "Tyler Palsulich <[email protected]>" gpg: WARNING: This key is not certified with a trusted signature! gpg: There is no indication that the signature belongs to the owner. Primary key fingerprint: 1D32 9CC2 D69C 821B FBE4 183E 8810 BB19 D4F1 0117 Not sure if Chris, Lewis et al are near you and do this quickly? Cheers, Dave
