I just finished the against govdocs1 with 1.7 vs. 1.8-rc1, and all looks good 
with one major change... on first glance.

Because of my "fix" on TIKA-1519 and the law of unintended consequences, files 
that start like so:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd";>
<html xmlns="http://www.w3.org/1999/xhtml";>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />

Have different Content-Type(s) between the 

In Tika 1.7, they used to have a Content-Type of: text/html; charset=iso-8859-1 

In Tika 1.8-rc1, they now have a Content-Type of: application/xhtml+xml

This is a major change.

Do we want this? 

 Or do we want to revert to the old behavior but add some kind of filter to 
prevent crazy Content-Type information like the following from overwriting what 
the detector detected:
<meta http-equiv="Content-Type" content="application/pdf" />
or
<meta http-equiv="Content-Type" content="anythingIFeelLikeInserting" />

-----Original Message-----
From: David Meikle [mailto:[email protected]] 
Sent: Wednesday, April 08, 2015 8:06 PM
To: [email protected]
Subject: Re: [VOTE] Release Apache Tika 1.8 Candidate #1

Hey Tyler,

> On 7 Apr 2015, at 19:54, Tyler Palsulich <[email protected]> wrote:
> 
> [ ] +1 Release this package as Apache Tika 1.8
> [ ] -1 Do not release this package because...

Whilst my testing with the release is good so far on Mac and Linux with Windows 
to go, and I am inclined to +1, it would be good if you were able to get your 
code signing key signed by someone nearby to avoid the warning below?

amadeaus-air:release david$ gpg --verify tika-1.8-src.zip.asc 
gpg: Signature made Tue  7 Apr 19:45:15 2015 EDT using RSA key ID D4F10117
gpg: Good signature from "Tyler Palsulich <[email protected]>"
gpg: WARNING: This key is not certified with a trusted signature!
gpg:          There is no indication that the signature belongs to the owner.
Primary key fingerprint: 1D32 9CC2 D69C 821B FBE4  183E 8810 BB19 D4F1 0117

Not sure if Chris, Lewis et al are near you and do this quickly?

Cheers,
Dave

Reply via email to