[ 
https://jira.nuxeo.org/browse/NXP-5590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=80909#action_80909
 ] 

Florent Guillaume commented on NXP-5590:
----------------------------------------

The file is in WordML and is not recognized by poi or OpenOffice when it has 
the .doc extension.
If renamed to .xml then OpenOffice can load it.
When loading the .xml in Nuxeo, preview doesn't work but pdf generation does.


> WordML files are not converted by MSOffice2Text
> -----------------------------------------------
>
>                 Key: NXP-5590
>                 URL: https://jira.nuxeo.org/browse/NXP-5590
>             Project: Nuxeo Enterprise Platform
>          Issue Type: Bug
>    Affects Versions: 5.3.2
>         Environment: Nuxeo 5.3.2 running from Tomcat. Windows 7 64 bit.
>            Reporter: Richard Louapre
>            Priority: Major
>         Attachments: wordml.doc
>
>
> Microsoft Office Word 2003 XML aka WordML are not converted in plain text by 
> MSOffice2Text. Here is the full stacktrace when I try to import this file 
> from the Files tab:
> 2010-09-08 12:03:17,782 ERROR 
> [org.nuxeo.ecm.core.storage.sql.coremodel.BinaryTextListener] Error during 
> MSOffice2Text conversion
> org.nuxeo.ecm.core.convert.api.ConversionException: Error during 
> MSOffice2Text conversion
>       at 
> org.nuxeo.ecm.core.convert.plugins.text.extractors.MSOffice2TextConverter.convert(MSOffice2TextConverter.java:59)
>       at 
> org.nuxeo.ecm.core.convert.service.ConversionServiceImpl.convert(ConversionServiceImpl.java:171)
>       at 
> org.nuxeo.ecm.core.convert.plugins.text.extractors.FullTextConverter.convert(FullTextConverter.java:72)
>       at 
> org.nuxeo.ecm.core.convert.service.ConversionServiceImpl.convert(ConversionServiceImpl.java:171)
>       at 
> org.nuxeo.ecm.core.storage.sql.coremodel.BinaryTextListener.blobsToText(BinaryTextListener.java:173)
>       at 
> org.nuxeo.ecm.core.storage.sql.coremodel.BinaryTextListener.handleEvent(BinaryTextListener.java:140)
>       at 
> org.nuxeo.ecm.core.event.impl.AsyncEventExecutor$Job.run(AsyncEventExecutor.java:137)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:651)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:676)
>       at java.lang.Thread.run(Thread.java:595)
> Caused by: org.nuxeo.ecm.core.api.WrappedException: Exception: 
> java.lang.IllegalArgumentException. message: Your InputStream was neither an 
> OLE2 stream, nor an OOXML stream
>       at 
> org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:88)
>       at 
> org.nuxeo.ecm.core.convert.plugins.text.extractors.MSOffice2TextConverter.convert(MSOffice2TextConverter.java:47)
>       ... 9 more
> This issue prevent to fulltext search on arabic documents that have been OCR 
> where the ocr is generated in this format.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://jira.nuxeo.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        
_______________________________________________
ECM-tickets mailing list
[email protected]
http://lists.nuxeo.com/mailman/listinfo/ecm-tickets

Reply via email to