Backwards compatibility issue found by clirr on TIKA-1587 [INFO] --- clirr-maven-plugin:2.3:check (default) @ tika-core ---
[ERROR] org.apache.tika.fork.ForkParser: Return type of method 'public java.lang.String getJavaCommand()' has been changed to java.util.List [ERROR] org.apache.tika.fork.ForkParser: Parameter 1 of 'public void setJavaCommand(java.lang.String)' has changed its type to java.util.List -----Original Message----- From: Hudson (JIRA) [mailto:j...@apache.org] Sent: Monday, March 30, 2015 10:35 AM To: talli...@apache.org Subject: [jira] [Commented] (TIKA-1584) Tika 1.7 possible regression (nested attachment files not getting parsed) [ https://issues.apache.org/jira/browse/TIKA-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386765#comment-14386765 ] Hudson commented on TIKA-1584: ------------------------------ FAILURE: Integrated in tika-trunk-jdk1.7 #585 (See [https://builds.apache.org/job/tika-trunk-jdk1.7/585/]) TIKA-1584: fixed regression in Tika 1.7 that prevents processing of embedded docs with /tika service (tallison: http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1670095) * /tika/trunk/tika-server/src/main/java/org/apache/tika/server/resource/MetadataResource.java * /tika/trunk/tika-server/src/main/java/org/apache/tika/server/resource/RecursiveMetadataResource.java * /tika/trunk/tika-server/src/main/java/org/apache/tika/server/resource/TikaResource.java * /tika/trunk/tika-server/src/test/java/org/apache/tika/server/TikaResourceTest.java > Tika 1.7 possible regression (nested attachment files not getting parsed) > ------------------------------------------------------------------------- > > Key: TIKA-1584 > URL: https://issues.apache.org/jira/browse/TIKA-1584 > Project: Tika > Issue Type: Bug > Components: server > Affects Versions: 1.7 > Reporter: Rob Tulloh > Assignee: Tim Allison > Priority: Blocker > Fix For: 1.8 > > > I tried to send this to the tika user list, but got a qmail failure so I am > opening a jira to see if I can get help with this. > There appears to be a change in the behavior of tika since 1.5 (the last > version we have used). In 1.5, if we pass a file with content type of rfc822 > which contains a zip that contains a docx file, the entire content would get > recursed and the text returned. In 1.7, tika only unwinds as far as the zip > file and ignores the content of the contained docx file. This is causing a > regression failure in our search tests because the contents of the docx file > are not found when searched for. > > We are testing with tika-server if this helps. If we ask the meta service to > just characterize the test data, it correctly determines the input is of type > rfc822. However, on extract, the contents of the attachment are not extracted > as expected. > curl -X PUT -T test.eml -q -H Content-Type:application/octet-stream > http://localhost:9998/meta 2>/dev/null | grep Content-Type > "Content-Type","message/rfc822" > curl -X PUT -T test.eml -q -H Content-Type:application/octet-stream > http://localhost:9998/tika 2>/dev/null | grep docx > sign.docx <<<<--- this is not expected, need contents of this extracted > We can easily reproduce this problem with a simple eml file with an > attachment. Can someone please comment if this seems like a problem or > perhaps we need to change something in our call to get the old behavior? -- This message was sent by Atlassian JIRA (v6.3.4#6332)