[jira] [Commented] (SOLR-9526) data_driven configs defaults to "strings" for unmapped fields, makes most fields containing "textual content" unsearchable, breaks tutorial examples

Alexandre Rafalovitch (JIRA) Thu, 06 Oct 2016 04:08:34 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-9526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15551641#comment-15551641
 ]


Alexandre Rafalovitch commented on SOLR-9526:
---------------------------------------------

Actually copyField already has a limiting parameter, it is called maxChars. So, 
we just need to generate the instruction. And I don't think we have a lot of 
flexibility on original field name (unless we support multiple matches and 
multiple ways to generate copyField), so we probably don't need to match it in 
anyway. We just need to indicate the target field construction pattern, which 
will need to be materialized if we are creating a separate copyField for each 
original field.

So it would look something like this:
{noformat}
<lst name="typeMapping">
        <str name="valueClass">java.lang.String</str>
        <str name="fieldType">text_general</str>
        <lst name="copyField">
          <str name="dest">*_str</str>
          <int name="maxChars">256</int>
        </lst>
</lst>
{noformat}

And for a field "xyz" it would generate:
{noformat}
<copyField source="xyz" dest="xyz_str" maxChars="256"/>
{noformat}

Hoss' proposal is nicer in that it is more flexible (we could put any URP 
sequence there) and we could generate different matching patterns. But as 
already mentioned, doing the URP-side copying is a bit more challenging. 
Especially since CloneField URP does not actually inherit FieldMutating URP 
(perhaps it should). And what happens if people want to remove the schemaless 
mode when going into production, will this suddenly break the setup and content 
stops flowing from text field to the string?


> data_driven configs defaults to "strings" for unmapped fields, makes most 
> fields containing "textual content" unsearchable, breaks tutorial examples
> ----------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-9526
>                 URL: https://issues.apache.org/jira/browse/SOLR-9526
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Hoss Man
>
> James Pritchett pointed out on the solr-user list that this sample query from 
> the quick start tutorial matched no docs (even though the tutorial text says 
> "The above request returns only one document")...
> http://localhost:8983/solr/gettingstarted/select?wt=json&indent=true&q=name:foundation
> The root problem seems to be that the add-unknown-fields-to-the-schema chain 
> in data_driven_schema_configs is configured with...
> {code}
> <str name="defaultFieldType">strings</str>
> {code}
> ...and the "strings" type uses StrField and is not tokenized.
> ----
> Original thread: 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201609.mbox/%3ccac-n2zrpsspfnk43agecspchc5b-0ff25xlfnzogyuvyg2d...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-9526) data_driven configs defaults to "strings" for unmapped fields, makes most fields containing "textual content" unsearchable, breaks tutorial examples

Reply via email to