[ 
https://jira.nuxeo.org/browse/NXP-5590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Florent Guillaume updated NXP-5590:
-----------------------------------

    Attachment:     (was: 00000001.rar)

> WordML files are not converted by MSOffice2Text
> -----------------------------------------------
>
>                 Key: NXP-5590
>                 URL: https://jira.nuxeo.org/browse/NXP-5590
>             Project: Nuxeo Enterprise Platform
>          Issue Type: Bug
>    Affects Versions: 5.3.2
>         Environment: Nuxeo 5.3.2 running from Tomcat. Windows 7 64 bit.
>            Reporter: Richard Louapre
>            Priority: Major
>         Attachments: wordml.doc
>
>
> Microsoft Office Word 2003 XML aka WordML are not converted in plain text by 
> MSOffice2Text. Here is the full stacktrace when I try to import this file 
> from the Files tab:
> 2010-09-08 12:03:17,782 ERROR 
> [org.nuxeo.ecm.core.storage.sql.coremodel.BinaryTextListener] Error during 
> MSOffice2Text conversion
> org.nuxeo.ecm.core.convert.api.ConversionException: Error during 
> MSOffice2Text conversion
>       at 
> org.nuxeo.ecm.core.convert.plugins.text.extractors.MSOffice2TextConverter.convert(MSOffice2TextConverter.java:59)
>       at 
> org.nuxeo.ecm.core.convert.service.ConversionServiceImpl.convert(ConversionServiceImpl.java:171)
>       at 
> org.nuxeo.ecm.core.convert.plugins.text.extractors.FullTextConverter.convert(FullTextConverter.java:72)
>       at 
> org.nuxeo.ecm.core.convert.service.ConversionServiceImpl.convert(ConversionServiceImpl.java:171)
>       at 
> org.nuxeo.ecm.core.storage.sql.coremodel.BinaryTextListener.blobsToText(BinaryTextListener.java:173)
>       at 
> org.nuxeo.ecm.core.storage.sql.coremodel.BinaryTextListener.handleEvent(BinaryTextListener.java:140)
>       at 
> org.nuxeo.ecm.core.event.impl.AsyncEventExecutor$Job.run(AsyncEventExecutor.java:137)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:651)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:676)
>       at java.lang.Thread.run(Thread.java:595)
> Caused by: org.nuxeo.ecm.core.api.WrappedException: Exception: 
> java.lang.IllegalArgumentException. message: Your InputStream was neither an 
> OLE2 stream, nor an OOXML stream
>       at 
> org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:88)
>       at 
> org.nuxeo.ecm.core.convert.plugins.text.extractors.MSOffice2TextConverter.convert(MSOffice2TextConverter.java:47)
>       ... 9 more
> This issue prevent to fulltext search on arabic documents that have been OCR 
> where the ocr is generated in this format.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://jira.nuxeo.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        
_______________________________________________
ECM-tickets mailing list
[email protected]
http://lists.nuxeo.com/mailman/listinfo/ecm-tickets

Reply via email to