[ 
https://issues.apache.org/jira/browse/SOLR-11916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344527#comment-16344527
 ] 

Hoss Man commented on SOLR-11916:
---------------------------------

re: useDocValuesAsStored -- here's my straw man proposal after sleeping on it a 
bit...

* SortableTextField.init should override the schemaVersion based implicit 
default in FieldType.init
** this means by default, no fieldType/field using SortableTextField w/default 
to useDocValuesAsStored
* SortableTextField.createFields should be aware of the effective value of 
SchemaField.useDocValuesAsStored and if it's true: fail (_at index time_) if 
any field values being added are longer then the (effective) 
maxCharsForDocValues
** this error message should be very clear about what's happening, mentioning 
both maxCharsForDocValues, and useDocValuesAsStored.

Net result: 
* clients that try to add huge values to fields with maxCharsForDocValues=small 
may get 2 diff behaviors depending on field's useDocValuesAsStored:
** if useDocValuesAsStored==false:
*** docvalues are truncated
** if useDocValuesAsStored==true:
*** request fails because solr can't "fit" the huge value into the "small" 
limit that's been configured
* ie: "the schema told us doc values should be limited to 'small' and to use 
doc values as if they were stored fields, and we can't meet those two 
expectations for your 'huge' field value, so we're rejecting it"


...i'm pretty sure this is all doable (even if the useDocValuesAsStored is 
specified on either the fieldType or the field) and i'll test it out soon.

> new SortableTextField using docValues built from the original string input
> --------------------------------------------------------------------------
>
>                 Key: SOLR-11916
>                 URL: https://issues.apache.org/jira/browse/SOLR-11916
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Hoss Man
>            Assignee: Hoss Man
>            Priority: Major
>         Attachments: SOLR-11916.patch
>
>
> I propose adding a new SortableTextField subclass that would functionally 
> work the same as TextField except:
>  * {{docValues="true|false"}} could be configured, with the default being 
> "true"
>  * The docValues would contain the original input values (just like StrField) 
> for sorting (or faceting)
>  ** By default, to protect users from excessively large docValues, only the 
> first 1024 of each field value would be used – but this could be overridden 
> with configuration.
> ----
> Consider the following sample configuration:
> {code:java}
> <field name="title" type="text_sortable" docValues="true"
>        indexed="true" docValues="true" stored="true" multiValued="false"/>
> <fieldType name="text_sortable" class="solr.SortableTextField">
>   <analyzer type="index">
>    ...
>   </analyzer>
>   <analyzer type="query">
>    ...
>   </analyzer>
> </fieldType>
> {code}
> Given a document with a title of "Solr In Action"
> Users could:
>  * Search for individual (indexed) terms in the "title" field: 
> {{q=title:solr}}
>  * Sort documents by title ( {{sort=title asc}} ) such that this document's 
> sort value would be "Solr In Action"
> If another document had a "title" value that was longer then 1024 chars, then 
> the docValues would be built using only the first 1024 characters of the 
> value (unless the user modified the configuration)
> This would be functionally equivalent to the following existing configuration 
> - including the on disk index segments - except that the on disk DocValues 
> would refer directly to the "title" field, reducing the total number of 
> "field infos" in the index (which has a small impact on segment housekeeping 
> and merge times) and end users would not need to sort on an alternate 
> "title_string" field name - the original "title" field name would always be 
> used directly.
> {code:java}
> <field name="title" type="text"
>        indexed="true" docValues="true" stored="true" multiValued="false"/>
> <field name="title_string" type="string"
>        indexed="false" docValues="true" stored="false" multiValued="false"/>
> <copyField source="title" dest="title_string" maxCharsForDocValues="1024" />
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to