[
https://issues.apache.org/jira/browse/SOLR-2416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13188616#comment-13188616
]
Jayendra Patil commented on SOLR-2416:
--------------------------------------
Tika parsers the zip file and extracts the complete content of the files as
well.
It parsers all the files in the zip as well as the the zip in zip.
The metadata is the zip file rather than the individual files
There would be no special handling required from the Solr side.
The metadata for the Zip and its contents would be indexed as well.
Also, Solr doesn't allow attaching multiple files with a single document.
Zip is a nice way of associating a document with multiple files.
And, as in the current behavior of indexing zip with just the file names
doesn't have much value in it.
> Solr Cell fails to index Zip file contents
> ------------------------------------------
>
> Key: SOLR-2416
> URL: https://issues.apache.org/jira/browse/SOLR-2416
> Project: Solr
> Issue Type: Bug
> Components: contrib - DataImportHandler, contrib - Solr Cell (Tika
> extraction)
> Affects Versions: 1.4.1
> Reporter: Jayendra Patil
> Fix For: 3.6, 4.0
>
> Attachments: SOLR-2416_ExtractingDocumentLoader.patch
>
>
> Working with the latest Solr Trunk code and seems the Tika handlers for Solr
> Cell (ExtractingDocumentLoader.java) and Data Import handler
> (TikaEntityProcessor.java) fails to index the zip file contents again.
> It just indexes the file names again.
> This issue was addressed some time back, late last year, but seems to have
> reappeared with the latest code.
> Jira for the Data Import handler part with the patch and the testcase -
> https://issues.apache.org/jira/browse/SOLR-2332.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]