[jira] [Commented] (SOLR-9526) data_driven configs defaults to "strings" for unmapped fields, makes most fields containing "textual content" unsearchable, breaks tutorial examples

Hoss Man (JIRA) Mon, 19 Sep 2016 09:53:43 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-9526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15503983#comment-15503983
 ]


Hoss Man commented on SOLR-9526:
--------------------------------

bq. Possibly to make facets work out of the box? Just guessing.

I'm probably the biggest proponent of "featuring" & promoting faceting in solr, 
and even i think it's absurd for our recomended cofigs to promote faceting at 
the expense of basic (tokenized) field search.

Hee's my off the cuff, un tested, straw man suggestion, that seems like it 
would be 100x better then what we have now...

* change {{defaultFieldType}} back to {{text_general}}
* add this to the processor chain, *after* 
AddSchemaFieldsUpdateProcessorFactory...{code}
<processor class="solr.CloneFieldUpdateProcessorFactory">
 <lst name="source">
  <str name="typeClass">solr.TextField</str>
  <lst name="exclude">
   <!-- large text fieds you don't want for sorting or faceting can be excluded 
here -->
  </lst>
 </lst>
 <lst name="dest">
  <str name="pattern">^(.*)$</str>
  <str name="replacement">$1_str</str>
 </lst>
</processor>
{code}
* Add {{<dynamicField name="*_str" type="strings" useDocValuesAsStored="false" 
indexed="true" stored="false"/>}} to the managed-schema
* ?? Add {{stored="true"}} to {{text_general}} ?? 
** All the existing fields/dynamicFields using this type set it explicitly to 
either true/false, but i think if we want to use it as the {{defaultFieldType}} 
we're going to want to set it to {{true}} on the fieldType itself so any fields 
added by AddSchemaFieldsUpdateProcessorFactory have the value stored (so end 
users can see them in search results)

This should fix the most egregious problems like what we see with the broken 
tutorial (folks add a simple "text" field containing a "name" or a "title" and 
can't search on "words" in that text field) while still supporting 
sorting/faceting on short "string" fields by using the {{_str}} variant.

I'm assuming this wouldn't break whatever "auto pick facet" stuff is in 
velocity, since i'm pretty sure it works by looking for all the 
{{solr.StrField}} fields, but if it does then that should be fixed as a 
distinct issue -- we shouldn't be breaking something as basic as "i want to 
search for a word in a field" just because it makes the velocity UI harder to 
use.


> data_driven configs defaults to "strings" for unmapped fields, makes most 
> fields containing "textual content" unsearchable, breaks tutorial examples
> ----------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-9526
>                 URL: https://issues.apache.org/jira/browse/SOLR-9526
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Hoss Man
>
> James Pritchett pointed out on the solr-user list that this sample query from 
> the quick start tutorial matched no docs (even though the tutorial text says 
> "The above request returns only one document")...
> http://localhost:8983/solr/gettingstarted/select?wt=json&indent=true&q=name:foundation
> The root problem seems to be that the add-unknown-fields-to-the-schema chain 
> in data_driven_schema_configs is configured with...
> {code}
> <str name="defaultFieldType">strings</str>
> {code}
> ...and the "strings" type uses StrField and is not tokenized.
> ----
> Original thread: 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201609.mbox/%3ccac-n2zrpsspfnk43agecspchc5b-0ff25xlfnzogyuvyg2d...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-9526) data_driven configs defaults to "strings" for unmapped fields, makes most fields containing "textual content" unsearchable, breaks tutorial examples

Reply via email to