[jira] [Commented] (TIKA-1130) .docx text extract leaves out some portions of text

Nick Burch (JIRA) Thu, 13 Jun 2013 14:21:52 -0700

    [ 
https://issues.apache.org/jira/browse/TIKA-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13682761#comment-13682761
 ]


Nick Burch commented on TIKA-1130:
----------------------------------

I think we've tended to prefix the method name, rather than commenting out, so 
it's more obvious that they want re-enabling later. Pop a note of the tika bug 
number, and POI bug number in the javadoc for the method, so someone later can 
easily work out why it was disabled and when it might be ready

That said, maybe this is our change to move at least one test to JUnit 4, so we 
can use @Ignore?
                
> .docx text extract leaves out some portions of text
> ---------------------------------------------------
>
>                 Key: TIKA-1130
>                 URL: https://issues.apache.org/jira/browse/TIKA-1130
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 1.2, 1.3
>         Environment: OpenJDK x86_64
>            Reporter: Daniel Gibby
>            Priority: Critical
>         Attachments: Resume 6.4.13.docx
>
>
> When parsing a Microsoft Word .docx 
> (application/vnd.openxmlformats-officedocument.wordprocessingml.document), 
> certain portions of text remain unextracted.
> I have attached a .docx file that can be tested against. The 'gray' portions 
> of text are what are not extracted, while the darker colored text extracts 
> fine.
> Looking at the document.xml portion of the .docx zip file shows the text is 
> all there.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TIKA-1130) .docx text extract leaves out some portions of text

Reply via email to