[jira] [Commented] (TIKA-1130) .docx text extract leaves out some portions of text

Tim Allison (JIRA) Mon, 24 Jun 2013 16:10:43 -0700

    [ 
https://issues.apache.org/jira/browse/TIKA-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692517#comment-13692517
 ]


Tim Allison commented on TIKA-1130:
-----------------------------------

Maven proxy setting in my settings.xml file is working for grabbing 
dependencies, but the proxy info isn't being transferred to testUrlOnly's 
url.openStream() in MimeDetectionTest.  The proxy props appear correctly in the 
surefire-report for MimeDetectionTest, but the proxy settings are null when I 
insert this into testUrlOnly:

System.out.println("HOST: " + System.getProperty("http.proxyHost"));
System.out.println("PORT: " + System.getProperty("http.proxyPort"));

Will likely find the answer as soon as I post this...
                
> .docx text extract leaves out some portions of text
> ---------------------------------------------------
>
>                 Key: TIKA-1130
>                 URL: https://issues.apache.org/jira/browse/TIKA-1130
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.2, 1.3
>         Environment: OpenJDK x86_64
>            Reporter: Daniel Gibby
>            Priority: Critical
>         Attachments: Resume 6.4.13.docx
>
>
> When parsing a Microsoft Word .docx 
> (application/vnd.openxmlformats-officedocument.wordprocessingml.document), 
> certain portions of text remain unextracted.
> I have attached a .docx file that can be tested against. The 'gray' portions 
> of text are what are not extracted, while the darker colored text extracts 
> fine.
> Looking at the document.xml portion of the .docx zip file shows the text is 
> all there.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TIKA-1130) .docx text extract leaves out some portions of text

Reply via email to