[ 
https://issues.apache.org/jira/browse/SOLR-12985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16687782#comment-16687782
 ] 

Jan Høydahl edited comment on SOLR-12985 at 11/15/18 10:37 AM:
---------------------------------------------------------------

I managed to reproduce in 7.5.0 with these steps:
{code:java}
wget https://issues.apache.org/jira/secure/attachment/12948197/crypted.xlsx
docker run --rm --name solr -d -p 8983:8983 solr:7.5.0
docker logs solr
docker exec solr solr create -c repro
curl 
'http://localhost:8983/solr/repro/update/extract?literal.id=doc1&commit=true' 
-F "myfile=@crypted.xlsx"
docker logs solr
docker stop solr
{code}
Next one can try to move jars around...


was (Author: janhoy):
I managed to reproduce in 7.5.0 with these steps:
{code:java}
wget https://issues.apache.org/jira/secure/attachment/12948197/crypted.xlsx
docker run --rm --name solr -d -p 8983:8983 solr:7.5.0
docker logs solr
docker exec solr solr create -c repro
curl 
'http://localhost:8983/solr/repro/update/extract?literal.id=doc1&commit=true' 
-F "myfile=@crypted.xlsx"
docker logs solr
{code}
Next one can try to move jars around...

> ClassNotFound indexing crypted documents
> ----------------------------------------
>
>                 Key: SOLR-12985
>                 URL: https://issues.apache.org/jira/browse/SOLR-12985
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: contrib - DataImportHandler
>    Affects Versions: 7.3.1
>            Reporter: Luca
>            Priority: Critical
>         Attachments: crypted.xlsx, db.sql, logs.zip, notcrypted.docx, 
> schema.zip
>
>
> When indexing a BLOB containing an encrypted Office Document (xls or xlsx but 
> I think all types) it fail with a very bad exception, if the document is not 
> encrypted works fine.
> I'm using the DataImportHandler.
> The exception seems also avoid the onError=skip or continue, making the 
> import fail.
> I tried to move the libraries from contrib/extraction/lib/ to server/lib and 
> the unfounded class changes, so it's a class loading issue.
> This is the base exception:
> Exception while processing: document_index document : 
> SolrInputDocument(fields: [site=187, index_type=document, resource_id=3, 
> title_full=Dati cliente.docx, id=d-XXX-3, publish_date=2018-09-28 00:00:00.0, 
> abstract= Azioni di recupero intraprese sulle Fatture telefoniche, 
> insert_date=2019-09-28 00:00:00.0, type=Documenti, 
> url=http://]):org.apache.solr.handler.dataimport.DataImportHandlerException: 
> Unable to read content Processing Document # 1
>     at 
> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:69)
>     at 
> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:171)
>     at 
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:267)
>     at 
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:476)
>     at 
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:517)
>     at 
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415)
>     at 
> org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:364)
>     at 
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:225)
>     at 
> org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:452)
>     at 
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:485)
>     at 
> org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:466)
>     at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.tika.exception.TikaException: TIKA-198: Illegal 
> IOException from org.apache.tika.parser.microsoft.OfficeParser@500efcf1
>     at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:286)
>     at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>     at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
>     at 
> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:165)
>     ... 10 more
> Caused by: java.io.IOException: java.lang.ClassNotFoundException: 
> org.apache.poi.poifs.crypt.agile.AgileEncryptionInfoBuilder
>     at 
> org.apache.poi.poifs.crypt.EncryptionInfo.<init>(EncryptionInfo.java:150)
>     at 
> org.apache.poi.poifs.crypt.EncryptionInfo.<init>(EncryptionInfo.java:102)
>     at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:203)
>     at 
> org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:132)
>     at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>     ... 13 more
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.poi.poifs.crypt.agile.AgileEncryptionInfoBuilder
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>     at 
> org.eclipse.jetty.webapp.WebAppClassLoader.loadClass(WebAppClassLoader.java:565)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>     at 
> org.apache.poi.poifs.crypt.EncryptionInfo.getBuilder(EncryptionInfo.java:222)
>     at 
> org.apache.poi.poifs.crypt.EncryptionInfo.<init>(EncryptionInfo.java:148)
>     ... 17 more



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to