[ https://issues.apache.org/jira/browse/SOLR-12985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16687782#comment-16687782 ]
Jan Høydahl edited comment on SOLR-12985 at 11/15/18 10:37 AM: --------------------------------------------------------------- I managed to reproduce in 7.5.0 with these steps: {code:java} wget https://issues.apache.org/jira/secure/attachment/12948197/crypted.xlsx docker run --rm --name solr -d -p 8983:8983 solr:7.5.0 docker logs solr docker exec solr solr create -c repro curl 'http://localhost:8983/solr/repro/update/extract?literal.id=doc1&commit=true' -F "myfile=@crypted.xlsx" docker logs solr docker stop solr {code} Next one can try to move jars around... was (Author: janhoy): I managed to reproduce in 7.5.0 with these steps: {code:java} wget https://issues.apache.org/jira/secure/attachment/12948197/crypted.xlsx docker run --rm --name solr -d -p 8983:8983 solr:7.5.0 docker logs solr docker exec solr solr create -c repro curl 'http://localhost:8983/solr/repro/update/extract?literal.id=doc1&commit=true' -F "myfile=@crypted.xlsx" docker logs solr {code} Next one can try to move jars around... > ClassNotFound indexing crypted documents > ---------------------------------------- > > Key: SOLR-12985 > URL: https://issues.apache.org/jira/browse/SOLR-12985 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - DataImportHandler > Affects Versions: 7.3.1 > Reporter: Luca > Priority: Critical > Attachments: crypted.xlsx, db.sql, logs.zip, notcrypted.docx, > schema.zip > > > When indexing a BLOB containing an encrypted Office Document (xls or xlsx but > I think all types) it fail with a very bad exception, if the document is not > encrypted works fine. > I'm using the DataImportHandler. > The exception seems also avoid the onError=skip or continue, making the > import fail. > I tried to move the libraries from contrib/extraction/lib/ to server/lib and > the unfounded class changes, so it's a class loading issue. > This is the base exception: > Exception while processing: document_index document : > SolrInputDocument(fields: [site=187, index_type=document, resource_id=3, > title_full=Dati cliente.docx, id=d-XXX-3, publish_date=2018-09-28 00:00:00.0, > abstract= Azioni di recupero intraprese sulle Fatture telefoniche, > insert_date=2019-09-28 00:00:00.0, type=Documenti, > url=http://]):org.apache.solr.handler.dataimport.DataImportHandlerException: > Unable to read content Processing Document # 1 > at > org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:69) > at > org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:171) > at > org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:267) > at > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:476) > at > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:517) > at > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415) > at > org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:364) > at > org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:225) > at > org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:452) > at > org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:485) > at > org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:466) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.tika.exception.TikaException: TIKA-198: Illegal > IOException from org.apache.tika.parser.microsoft.OfficeParser@500efcf1 > at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:286) > at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) > at > org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:165) > ... 10 more > Caused by: java.io.IOException: java.lang.ClassNotFoundException: > org.apache.poi.poifs.crypt.agile.AgileEncryptionInfoBuilder > at > org.apache.poi.poifs.crypt.EncryptionInfo.<init>(EncryptionInfo.java:150) > at > org.apache.poi.poifs.crypt.EncryptionInfo.<init>(EncryptionInfo.java:102) > at > org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:203) > at > org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:132) > at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > ... 13 more > Caused by: java.lang.ClassNotFoundException: > org.apache.poi.poifs.crypt.agile.AgileEncryptionInfoBuilder > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > at > org.eclipse.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:565) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > at > org.apache.poi.poifs.crypt.EncryptionInfo.getBuilder(EncryptionInfo.java:222) > at > org.apache.poi.poifs.crypt.EncryptionInfo.<init>(EncryptionInfo.java:148) > ... 17 more -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org