[
https://jira.nuxeo.org/browse/NXP-5590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Florent Guillaume updated NXP-5590:
-----------------------------------
Attachment: (was: 00000001.rar)
> WordML files are not converted by MSOffice2Text
> -----------------------------------------------
>
> Key: NXP-5590
> URL: https://jira.nuxeo.org/browse/NXP-5590
> Project: Nuxeo Enterprise Platform
> Issue Type: Bug
> Affects Versions: 5.3.2
> Environment: Nuxeo 5.3.2 running from Tomcat. Windows 7 64 bit.
> Reporter: Richard Louapre
> Priority: Major
> Attachments: wordml.doc
>
>
> Microsoft Office Word 2003 XML aka WordML are not converted in plain text by
> MSOffice2Text. Here is the full stacktrace when I try to import this file
> from the Files tab:
> 2010-09-08 12:03:17,782 ERROR
> [org.nuxeo.ecm.core.storage.sql.coremodel.BinaryTextListener] Error during
> MSOffice2Text conversion
> org.nuxeo.ecm.core.convert.api.ConversionException: Error during
> MSOffice2Text conversion
> at
> org.nuxeo.ecm.core.convert.plugins.text.extractors.MSOffice2TextConverter.convert(MSOffice2TextConverter.java:59)
> at
> org.nuxeo.ecm.core.convert.service.ConversionServiceImpl.convert(ConversionServiceImpl.java:171)
> at
> org.nuxeo.ecm.core.convert.plugins.text.extractors.FullTextConverter.convert(FullTextConverter.java:72)
> at
> org.nuxeo.ecm.core.convert.service.ConversionServiceImpl.convert(ConversionServiceImpl.java:171)
> at
> org.nuxeo.ecm.core.storage.sql.coremodel.BinaryTextListener.blobsToText(BinaryTextListener.java:173)
> at
> org.nuxeo.ecm.core.storage.sql.coremodel.BinaryTextListener.handleEvent(BinaryTextListener.java:140)
> at
> org.nuxeo.ecm.core.event.impl.AsyncEventExecutor$Job.run(AsyncEventExecutor.java:137)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:651)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:676)
> at java.lang.Thread.run(Thread.java:595)
> Caused by: org.nuxeo.ecm.core.api.WrappedException: Exception:
> java.lang.IllegalArgumentException. message: Your InputStream was neither an
> OLE2 stream, nor an OOXML stream
> at
> org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:88)
> at
> org.nuxeo.ecm.core.convert.plugins.text.extractors.MSOffice2TextConverter.convert(MSOffice2TextConverter.java:47)
> ... 9 more
> This issue prevent to fulltext search on arabic documents that have been OCR
> where the ocr is generated in this format.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
https://jira.nuxeo.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
_______________________________________________
ECM-tickets mailing list
[email protected]
http://lists.nuxeo.com/mailman/listinfo/ecm-tickets