[ 
https://issues.apache.org/jira/browse/SOLR-2416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-2416:
---------------------------

    Affects Version/s:     (was: 4.0)
                       1.4.1
        Fix Version/s: 3.2
              Summary: Solr Cell fails to index Zip file contents  (was: Solr 
Cell & DataImport Tika handler broken - fails to index Zip file contents)

I'm not sure what exactly jayendra is referring to by "was addressed some time 
back ... seems to have reappeared" (i couldn't find any issues that looked 
similar) but i just tested and confirmed that in 1.4.1 SolrCell only indexed 
the metadata about *.zip files, not the contents of the zip.

the behavior in the 3.1rc1 solr release candidate is consistent with 1.4.1 - 
only info about the zip file itself is extracted, not the contents (although in 
3.1 we actually extract more metadata then we did in 1.4.1) so this definitely 
isn't a 3.1 blocker (some people were wondering on IRC)

I'm not personally even clear if this is really a bug, or if it should be 
request option driven -- perhaps some users only want the data about the zip 
file, not it's contents; and what should the beahvior be if zip file contains 
multiple files, and the request specifies a literal id?

> Solr Cell fails to index Zip file contents
> ------------------------------------------
>
>                 Key: SOLR-2416
>                 URL: https://issues.apache.org/jira/browse/SOLR-2416
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler, contrib - Solr Cell (Tika 
> extraction)
>    Affects Versions: 1.4.1
>            Reporter: Jayendra Patil
>             Fix For: 3.2
>
>         Attachments: SOLR-2416_ExtractingDocumentLoader.patch
>
>
> Working with the latest Solr Trunk code and seems the Tika handlers for Solr 
> Cell (ExtractingDocumentLoader.java) and Data Import handler 
> (TikaEntityProcessor.java) fails to index the zip file contents again.
> It just indexes the file names again.
> This issue was addressed some time back, late last year, but seems to have 
> reappeared with the latest code.
> Jira for the Data Import handler part with the patch and the testcase - 
> https://issues.apache.org/jira/browse/SOLR-2332.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to