[ https://issues.apache.org/jira/browse/SOLR-2416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hoss Man updated SOLR-2416: --------------------------- Affects Version/s: (was: 4.0) 1.4.1 Fix Version/s: 3.2 Summary: Solr Cell fails to index Zip file contents (was: Solr Cell & DataImport Tika handler broken - fails to index Zip file contents) I'm not sure what exactly jayendra is referring to by "was addressed some time back ... seems to have reappeared" (i couldn't find any issues that looked similar) but i just tested and confirmed that in 1.4.1 SolrCell only indexed the metadata about *.zip files, not the contents of the zip. the behavior in the 3.1rc1 solr release candidate is consistent with 1.4.1 - only info about the zip file itself is extracted, not the contents (although in 3.1 we actually extract more metadata then we did in 1.4.1) so this definitely isn't a 3.1 blocker (some people were wondering on IRC) I'm not personally even clear if this is really a bug, or if it should be request option driven -- perhaps some users only want the data about the zip file, not it's contents; and what should the beahvior be if zip file contains multiple files, and the request specifies a literal id? > Solr Cell fails to index Zip file contents > ------------------------------------------ > > Key: SOLR-2416 > URL: https://issues.apache.org/jira/browse/SOLR-2416 > Project: Solr > Issue Type: Bug > Components: contrib - DataImportHandler, contrib - Solr Cell (Tika > extraction) > Affects Versions: 1.4.1 > Reporter: Jayendra Patil > Fix For: 3.2 > > Attachments: SOLR-2416_ExtractingDocumentLoader.patch > > > Working with the latest Solr Trunk code and seems the Tika handlers for Solr > Cell (ExtractingDocumentLoader.java) and Data Import handler > (TikaEntityProcessor.java) fails to index the zip file contents again. > It just indexes the file names again. > This issue was addressed some time back, late last year, but seems to have > reappeared with the latest code. > Jira for the Data Import handler part with the patch and the testcase - > https://issues.apache.org/jira/browse/SOLR-2332. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org