Re: Support for case insensitive sorts with doc values

Hugues Malphettes Thu, 05 Feb 2015 22:12:01 -0800

Hi Angie,

On Friday, 6 February 2015 12:17:47 UTC+8, Geetanjali Paygude wrote:
>
> Hi Hugues, 
>
> So you have extended "String" type to add custom analyzer. 
>
> I am referring to this thread
>
> http://elasticsearch-users.115913.n3.nabble.com/Support-for-case-insensitive-sorts-with-doc-values-tt4064487.html
>
> Is there any way to use script/transform the source and then apply sort on 
> it? If yes will you please share the same. 
>


>
> As mentioned by Adrien, is there work-around on client-side before the 
> data gets into elasticsearch using some native / groovy script? 
>
I believe Adrien suggested this procedure:
- create a second field specifically to store the value as a 
docvalue/not_analyzed string
- on the client-side analyze the string yourself
- add the new value as a separate field in the document you index
- "profit": use that new field for sorting and other queries

A variation of this consists of delegating the generation of the second 
field's value to a _source transform.
- create the same second field: docvalues-not_analyzed
- define a source transform for the affected type of document
- in the script of the source transform apply the transformation you need
- "profit"
You are saving some bandwidth, the _source of your document will never show 
the second value and the impact on your client code is limited to the 
queries.
ES will work more and the transform you can do in the script might be 
limited.


>
> All I want is following 
>
>
> 1. We have ICU plugin which helps us achieve custom sorting to some 
> extent. 
> 3. However, the problem now is that we are trying to use the doc_values = 
> true option in mapping but this  cannot be used for string fields having 
> analyzer. 
> 4. So if we need to use ICU plugin then we cannot use doc_value option. 
> 5.Other way is to use the ICU plugin as a library i.e. we call some API in 
> that plugin which converts our field into required format for sorting. 
>
>
> So is there a way to call some API or transform input using script ?
>
I suspect it might be difficult to invoke the ICU transformation via a 
groovy script.
You could make it work with a native script written in java.

>
>
> OR If I use your analyzer in a native script, how to invoke the same from 
> mappings. Please provide usage example
>
My code snippet is in fact a new mapping type; not an analyzer.
It is more or less a fork of the original string mapping as defined inside 
Elasticsearch.

I have packaged this new mapping type in a plugin here: 
https://github.com/hmalphettes/elasticsearch-docvalues-string

It is a work in progress. Help is welcome if it is useful for you.

I hope this helps.
Let us know,
Hugues
 

>
>
>
> Thanks,
> Angie
>
> On Friday, 14 November 2014 06:55:36 UTC+5:30, Hugues Malphettes wrote:
>>
>> Hi Adrian and everyone,
>>
>> I gave a shot at a extending the 'string' type to add another analyzer:
>> https://gist.github.com/hmalphettes/b402d72230e9009f960c
>>
>> The parameter "index_docvalues_analyzer" when present on the mapping 
>> definition will generate a Token Stream and the first token is stored as a 
>> SortedSetDocValuesField.
>>
>> It works for me. WOuld it be interesting to make this part of the 
>> standard StringFieldMapper?
>>
>> Cheers!
>> Hugues
>>
>> On Tuesday, 7 October 2014 18:11:18 UTC+8, Hugues Malphettes wrote:
>>>
>>> Thanks Adrian,
>>>
>>> I'll give a shot at the source transform then.
>>>
>>> If you consider that it makes sense to support this, would it be helpful 
>>> to file an enhancement request on github?
>>> Give us a hint if you think it can be done by an occasional contributor 
>>> ;-)
>>>
>>> Cheers,
>>> Hugues
>>>
>>> On Tuesday, 7 October 2014 17:59:29 UTC+8, Adrien Grand wrote:
>>>>
>>>> Hi Hugues,
>>>>
>>>> For now the work-around would indeed be to do the work on client-side 
>>>> before the data gets into elasticsearch (or potentially using the _source 
>>>> transform[1] feature).
>>>>
>>>> [1] 
>>>> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-transform.html
>>>>
>>>> On Tue, Oct 7, 2014 at 9:19 AM, Hugues Malphettes <[email protected]> 
>>>> wrote:
>>>>
>>>>> Hi everyone,
>>>>>
>>>>> Case insensitive sort is elegantly supported by using a custom 
>>>>> analyzer [1].
>>>>> `doc values` are documented as a great fit for sorting [2] to save 
>>>>> heap memory.
>>>>>
>>>>> However doc values are not support for analyzed strings at the moment.
>>>>>
>>>>> Are we planning to support doc values for analyzers that emit a single 
>>>>> token per string?
>>>>> Is it worth it to have the ES client do the lower-casing and collation 
>>>>> itself?
>>>>>
>>>>> Thanks!
>>>>> Hugues
>>>>>
>>>>> [1] 
>>>>> http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/sorting-collations.html#case-insensitive-sorting
>>>>> [2] 
>>>>> http://www.elasticsearch.org/blog/elasticsearch-1-4-0-beta-released/ 
>>>>>
>>>>> -- 
>>>>> You received this message because you are subscribed to the Google 
>>>>> Groups "elasticsearch" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>> an email to [email protected].
>>>>> To view this discussion on the web visit 
>>>>> https://groups.google.com/d/msgid/elasticsearch/4095e5b7-1fb4-477a-b27f-3e4519ab9000%40googlegroups.com
>>>>>  
>>>>> <https://groups.google.com/d/msgid/elasticsearch/4095e5b7-1fb4-477a-b27f-3e4519ab9000%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>
>>>>
>>>> -- 
>>>> Adrien Grand
>>>>  
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a9d415bd-b8c6-4d5c-80e6-70b7676eb6b2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Support for case insensitive sorts with doc values

Reply via email to