Hi all,
I tried to extract MS Word 2007 .docx files using below implementation
(Using poi-3.9.0) in order to fix an existing bug in API Manager 1.7.0.
(The bug was about giving an error while extracting and indexing  .docx
files in MSWordIndexer)

         XWPFDocument doc = new XWPFDocument(new
ByteArrayInputStream(fileData.data));
         XWPFWordExtractor extractor = new XWPFWordExtractor(doc);
         String wordText = extractor.getText();

Then I applied the patch and tried to upload and extract a .docx file, but
following error was given.

[2014-07-14 09:46:20,328] ERROR - AsyncIndexer Error while indexing.
java.lang.ExceptionInInitializerError
at
org.openxmlformats.schemas.wordprocessingml.x2006.main.DocumentDocument$Factory.parse(Unknown
Source)
at
org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:134)
 at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:159)
at org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:123)
 at
org.wso2.carbon.apimgt.impl.indexing.indexer.MSWordIndexer.getIndexedDocument(MSWordIndexer.java:40)
at
org.wso2.carbon.registry.indexing.solr.SolrClient.indexDocument(SolrClient.java:178)
 at
org.wso2.carbon.registry.indexing.AsyncIndexer$IndexingTask.doWork(AsyncIndexer.java:203)
at
org.wso2.carbon.registry.indexing.AsyncIndexer$IndexingTask.run(AsyncIndexer.java:189)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.RuntimeException: Cannot load SchemaTypeSystem. Unable
to load class with name
schemaorg_apache_xmlbeans.system.sE130CAA0A01A7CDE5A2B4FEB8B311707.TypeSystemHolder.
Make sure the generated binary files are on the classpath.
 at org.apache.xmlbeans.XmlBeans.typeSystemForClassLoader(XmlBeans.java:783)
at
org.openxmlformats.schemas.wordprocessingml.x2006.main.DocumentDocument.<clinit>(Unknown
Source)
 ... 14 more
Caused by: java.lang.ClassNotFoundException:
schemaorg_apache_xmlbeans.system.sE130CAA0A01A7CDE5A2B4FEB8B311707.TypeSystemHolder
at
org.eclipse.osgi.internal.loader.BundleLoader.findClassInternal(BundleLoader.java:501)
 at
org.eclipse.osgi.internal.loader.BundleLoader.findClass(BundleLoader.java:421)
at
org.eclipse.osgi.internal.loader.BundleLoader.findClass(BundleLoader.java:412)
 at
org.eclipse.osgi.internal.baseadaptor.DefaultClassLoader.loadClass(DefaultClassLoader.java:107)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
 at org.apache.xmlbeans.XmlBeans.typeSystemForClassLoader(XmlBeans.java:769)
... 15 more

Then I put poi-ooxml-schemas-3.9.jar into repository/components/lib
directory in order to fix above error. But when I again tried to extract a
.docx file, the below class casting error was occurred.

[2014-07-14 09:52:22,518] ERROR - AsyncIndexer Could not index the
resource:
path=/_system/governance/apimgt/applicationdata/provider/admin/dwd/2/documentation/files/r.docx,
media type=application/msword
java.lang.ClassCastException:
org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.DocumentDocumentImpl
cannot be cast to
org.openxmlformats.schemas.wordprocessingml.x2006.main.DocumentDocument
at
org.openxmlformats.schemas.wordprocessingml.x2006.main.DocumentDocument$Factory.parse(Unknown
Source)
 at
org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:134)
at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:159)
 at org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:123)
at
org.wso2.carbon.apimgt.impl.indexing.indexer.MSWordIndexer.getIndexedDocument(MSWordIndexer.java:40)
 at
org.wso2.carbon.registry.indexing.solr.SolrClient.indexDocument(SolrClient.java:178)
at
org.wso2.carbon.registry.indexing.AsyncIndexer$IndexingTask.doWork(AsyncIndexer.java:203)
 at
org.wso2.carbon.registry.indexing.AsyncIndexer$IndexingTask.run(AsyncIndexer.java:189)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)

What may be the issue here?

-- 
Thilini Shanika
Software Engineer
WSO2, Inc.; http://wso2.com
20, Palmgrove Avenue, Colombo 3

E-mail: tgtshan...@gmail.com
_______________________________________________
Dev mailing list
Dev@wso2.org
http://wso2.org/cgi-bin/mailman/listinfo/dev

Reply via email to