[
https://issues.apache.org/jira/browse/SOLR-3438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jack Krupansky updated SOLR-3438:
---------------------------------
Attachment: SolrCell meta field.xml
I added two files via SolrCell, a Word doc (.docx) and a PDF file which is the
same document but saved in PDF format.
> Document SolrCells' use of the meta field
> -----------------------------------------
>
> Key: SOLR-3438
> URL: https://issues.apache.org/jira/browse/SOLR-3438
> Project: Solr
> Issue Type: Improvement
> Components: contrib - Solr Cell (Tika extraction)
> Reporter: Jack Krupansky
> Priority: Minor
> Attachments: SolrCell meta field.xml
>
>
> SolrCell will add document metadata to the field named "meta" if it is
> present in the schema. This is undocumented behavior and can be surprising
> and confusing to users who explicitly added a field named "meta" to their
> schema for their own purpose without any awareness that SolrCell would be
> populating it with document metadata.
> This issue merely proposes to clearly document the use of the "meta" field,
> but several questions do arise:
> 1) Is this behavior actually intended as a released future? As opposed to
> some experimental work that wasn't intended for release just yet.
> 2) Should there be a request parameter to be able to disable this feature?
> 2a) Should the default be to have to enabled?
> 3) Should the "meta" field be added to the example schema (the section for
> SolrCell metadata fields) to reinforce the fact that a user should not
> blindly add their own "meta" field for some other purpose?
> 4) Should there be a request parameter to redirect this behavior to a named
> field?
> 4a) Should the default name be more explicit (e.g., solrcell_metadata)?
> This issue is only intended to address documentation related to question #1.
> If the answers to any of the other questions are "yes", a separate issue can
> be opened. That said, I do lean towards adding the "meta" field to the
> example schema as part of the "documentation."
> For reference I'll attend a snippet from a query result that has the "meta"
> field populated by an extracted Word document.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]