Backwards compatibility issue found by clirr on TIKA-1587

[INFO] --- clirr-maven-plugin:2.3:check (default) @ tika-core ---

[ERROR] org.apache.tika.fork.ForkParser: Return type of method 'public 
java.lang.String getJavaCommand()' has been changed to java.util.List
[ERROR] org.apache.tika.fork.ForkParser: Parameter 1 of 'public void 
setJavaCommand(java.lang.String)' has changed its type to java.util.List

-----Original Message-----
From: Hudson (JIRA) [mailto:j...@apache.org] 
Sent: Monday, March 30, 2015 10:35 AM
To: talli...@apache.org
Subject: [jira] [Commented] (TIKA-1584) Tika 1.7 possible regression (nested 
attachment files not getting parsed)


    [ 
https://issues.apache.org/jira/browse/TIKA-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386765#comment-14386765
 ] 

Hudson commented on TIKA-1584:
------------------------------

FAILURE: Integrated in tika-trunk-jdk1.7 #585 (See 
[https://builds.apache.org/job/tika-trunk-jdk1.7/585/])
TIKA-1584: fixed regression in Tika 1.7 that prevents processing of embedded 
docs with /tika service (tallison: 
http://svn.apache.org/viewvc/tika/trunk/?view=rev&rev=1670095)
* 
/tika/trunk/tika-server/src/main/java/org/apache/tika/server/resource/MetadataResource.java
* 
/tika/trunk/tika-server/src/main/java/org/apache/tika/server/resource/RecursiveMetadataResource.java
* 
/tika/trunk/tika-server/src/main/java/org/apache/tika/server/resource/TikaResource.java
* 
/tika/trunk/tika-server/src/test/java/org/apache/tika/server/TikaResourceTest.java


> Tika 1.7 possible regression (nested attachment files not getting parsed)
> -------------------------------------------------------------------------
>
>                 Key: TIKA-1584
>                 URL: https://issues.apache.org/jira/browse/TIKA-1584
>             Project: Tika
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 1.7
>            Reporter: Rob Tulloh
>            Assignee: Tim Allison
>            Priority: Blocker
>             Fix For: 1.8
>
>
> I tried to send this to the tika user list, but got a qmail failure so I am 
> opening a jira to see if I can get help with this.
> There appears to be a change in the behavior of tika since 1.5 (the last 
> version we have used). In 1.5, if we pass a file with content type of rfc822 
> which contains a zip that contains a docx file, the entire content would get 
> recursed and the text returned. In 1.7, tika only unwinds as far as the zip 
> file and ignores the content of the contained docx file. This is causing a 
> regression failure in our search tests because the contents of the docx file 
> are not found when searched for.
>  
> We are testing with tika-server if this helps. If we ask the meta service to 
> just characterize the test data, it correctly determines the input is of type 
> rfc822. However, on extract, the contents of the attachment are not extracted 
> as expected.
> curl -X PUT -T test.eml -q -H Content-Type:application/octet-stream  
> http://localhost:9998/meta 2>/dev/null | grep Content-Type
> "Content-Type","message/rfc822"
> curl -X PUT -T test.eml -q -H Content-Type:application/octet-stream  
> http://localhost:9998/tika 2>/dev/null | grep docx
> sign.docx       <<<<--- this is not expected, need contents of this extracted
> We can easily reproduce this problem with a simple eml file with an 
> attachment. Can someone please comment if this seems like a problem or 
> perhaps we need to change something in our call to get the old behavior?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to