[jira] [Commented] (SOLR-3439) Add "content" field to example schema to make SolrCell easier to use out of the box

Jack Krupansky (JIRA) Mon, 07 May 2012 10:39:13 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13269803#comment-13269803
 ]


Jack Krupansky commented on SOLR-3439:
--------------------------------------

Based on the discussion here and on SOLR-3442, I would offer two alternative 
proposals:

1. If SOLR-3442 is implemented (default user query parser in example becomes 
edismax), add the "content" field as stored and indexed, add "content" to the 
edismax "qf", but don't add the copyField(s).

2. If SOLR-3442 is NOT implemented, add the "content" field as stored but NOT 
indexed, and add the copyField ("content" to "text"). Regardless of query 
parser, this will assure that "content" is both searchable and returnable, but 
without "double indexing".

I'll wait a bit to see how SOLR-3442 evolves. But if it doesn't look likely in 
a reasonable timeframe, I'll revise my patch for alternative #2 which provides 
the desired functionality with minimal impact.

But for now, I'll assume that SOLR-3442 is the more likely and preferable 
approach.



                
> Add "content" field to example schema to make SolrCell easier to use out of 
> the box
> -----------------------------------------------------------------------------------
>
>                 Key: SOLR-3439
>                 URL: https://issues.apache.org/jira/browse/SOLR-3439
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - Solr Cell (Tika extraction), Schema and 
> Analysis
>            Reporter: Jack Krupansky
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: Lincoln-Gettysburg-Address.docx, 
> Lincoln-Gettysburg-Address.pdf, SOLR-3439.patch
>
>
> Currently, SolrCell is configured to map Tika "content" (the main body of a 
> document) to the "text" field which is the indexed-only (not stored) 
> catch-all for default queries. That searches fine, but doesn't show the 
> document content in the results, sometimes leading users to think that 
> something is wrong. Sure, the user can easily add the field (and this is 
> documented), but it would be a better user experience to have such a basic 
> feature work right out of the box without any config editing and without the 
> need for the user to read the fine print in the documentation.
> I propose that we add the "content" field to the example schema in the 
> section of fields already defined to support SolrCell metadata. It would be 
> stored and indexed.
> I further propose that a copyField be added for the "title", "description", 
> (and maybe a couple of others) and "content" fields to add them to the "text" 
> field for searching. Again, trying to improve the out of the box user 
> experience. It also simplifies testing - less setup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3439) Add "content" field to example schema to make SolrCell easier to use out of the box

Reply via email to