Re: Support for case insensitive sorts with doc values

Geetanjali Paygude Tue, 10 Feb 2015 00:18:57 -0800

Thanks Hugues,

This is helpful !! However I was trying to write Java native script for 
sorting values using ICUCollation.
Please find attached JAR . This script works fine. However it gives 
incorrect sorting result.


Please find below code snippet

Script is as follows:

PUT /custom1_index

{

    "my_type": 

    {"properties" :

      { "LastName" :

        {

        "type": "string", 

        "index":"not-anlayzed"

       }

      }  

    }

}

PUT /custom1_index/my_type/1

{ "LastName": "AAP" 


}

PUT /custom1_index/my_type/2

{ "LastName": "zara" 


}


PUT /custom1_index/my_type/3

{ "LastName": "beta" 


}


GET /custom1_index/_search

{

 

    

    "script_fields": {

        "sort": {

            "script": "ICUSortingScriptFilter",

            "lang": "native",

            "params": {

                "field": “LastName"

                

            }

        },

        "type": "string"

     

    }

    

     

}
Please let me know if any correction is required in this script.

Regards,
Angie


On Friday, 6 February 2015 11:40:53 UTC+5:30, Hugues Malphettes wrote:
>
> Hi Angie,
>
> On Friday, 6 February 2015 12:17:47 UTC+8, Geetanjali Paygude wrote:
>>
>> Hi Hugues, 
>>
>> So you have extended "String" type to add custom analyzer. 
>>
>> I am referring to this thread
>>
>> http://elasticsearch-users.115913.n3.nabble.com/Support-for-case-insensitive-sorts-with-doc-values-tt4064487.html
>>
>> Is there any way to use script/transform the source and then apply sort 
>> on it? If yes will you please share the same. 
>>
>
>>
>> As mentioned by Adrien, is there work-around on client-side before the 
>> data gets into elasticsearch using some native / groovy script? 
>>
> I believe Adrien suggested this procedure:
> - create a second field specifically to store the value as a 
> docvalue/not_analyzed string
> - on the client-side analyze the string yourself
> - add the new value as a separate field in the document you index
> - "profit": use that new field for sorting and other queries
>
> A variation of this consists of delegating the generation of the second 
> field's value to a _source transform.
> - create the same second field: docvalues-not_analyzed
> - define a source transform for the affected type of document
> - in the script of the source transform apply the transformation you need
> - "profit"
> You are saving some bandwidth, the _source of your document will never 
> show the second value and the impact on your client code is limited to the 
> queries.
> ES will work more and the transform you can do in the script might be 
> limited.
>
>
>>
>> All I want is following 
>>
>>
>> 1. We have ICU plugin which helps us achieve custom sorting to some 
>> extent. 
>> 3. However, the problem now is that we are trying to use the doc_values = 
>> true option in mapping but this  cannot be used for string fields having 
>> analyzer. 
>> 4. So if we need to use ICU plugin then we cannot use doc_value option. 
>> 5.Other way is to use the ICU plugin as a library i.e. we call some API 
>> in that plugin which converts our field into required format for sorting. 
>>
>>
>> So is there a way to call some API or transform input using script ?
>>
> I suspect it might be difficult to invoke the ICU transformation via a 
> groovy script.
> You could make it work with a native script written in java.
>
>>
>>
>> OR If I use your analyzer in a native script, how to invoke the same from 
>> mappings. Please provide usage example
>>
> My code snippet is in fact a new mapping type; not an analyzer.
> It is more or less a fork of the original string mapping as defined inside 
> Elasticsearch.
>
> I have packaged this new mapping type in a plugin here: 
> https://github.com/hmalphettes/elasticsearch-docvalues-string
>
> It is a work in progress. Help is welcome if it is useful for you.
>
> I hope this helps.
> Let us know,
> Hugues
>  
>
>>
>>
>>
>> Thanks,
>> Angie
>>
>> On Friday, 14 November 2014 06:55:36 UTC+5:30, Hugues Malphettes wrote:
>>>
>>> Hi Adrian and everyone,
>>>
>>> I gave a shot at a extending the 'string' type to add another analyzer:
>>> https://gist.github.com/hmalphettes/b402d72230e9009f960c
>>>
>>> The parameter "index_docvalues_analyzer" when present on the mapping 
>>> definition will generate a Token Stream and the first token is stored as a 
>>> SortedSetDocValuesField.
>>>
>>> It works for me. WOuld it be interesting to make this part of the 
>>> standard StringFieldMapper?
>>>
>>> Cheers!
>>> Hugues
>>>
>>> On Tuesday, 7 October 2014 18:11:18 UTC+8, Hugues Malphettes wrote:
>>>>
>>>> Thanks Adrian,
>>>>
>>>> I'll give a shot at the source transform then.
>>>>
>>>> If you consider that it makes sense to support this, would it be 
>>>> helpful to file an enhancement request on github?
>>>> Give us a hint if you think it can be done by an occasional contributor 
>>>> ;-)
>>>>
>>>> Cheers,
>>>> Hugues
>>>>
>>>> On Tuesday, 7 October 2014 17:59:29 UTC+8, Adrien Grand wrote:
>>>>>
>>>>> Hi Hugues,
>>>>>
>>>>> For now the work-around would indeed be to do the work on client-side 
>>>>> before the data gets into elasticsearch (or potentially using the _source 
>>>>> transform[1] feature).
>>>>>
>>>>> [1] 
>>>>> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-transform.html
>>>>>
>>>>> On Tue, Oct 7, 2014 at 9:19 AM, Hugues Malphettes <[email protected]> 
>>>>> wrote:
>>>>>
>>>>>> Hi everyone,
>>>>>>
>>>>>> Case insensitive sort is elegantly supported by using a custom 
>>>>>> analyzer [1].
>>>>>> `doc values` are documented as a great fit for sorting [2] to save 
>>>>>> heap memory.
>>>>>>
>>>>>> However doc values are not support for analyzed strings at the moment.
>>>>>>
>>>>>> Are we planning to support doc values for analyzers that emit a 
>>>>>> single token per string?
>>>>>> Is it worth it to have the ES client do the lower-casing and 
>>>>>> collation itself?
>>>>>>
>>>>>> Thanks!
>>>>>> Hugues
>>>>>>
>>>>>> [1] 
>>>>>> http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/sorting-collations.html#case-insensitive-sorting
>>>>>> [2] 
>>>>>> http://www.elasticsearch.org/blog/elasticsearch-1-4-0-beta-released/ 
>>>>>>
>>>>>> -- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "elasticsearch" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>> send an email to [email protected].
>>>>>> To view this discussion on the web visit 
>>>>>> https://groups.google.com/d/msgid/elasticsearch/4095e5b7-1fb4-477a-b27f-3e4519ab9000%40googlegroups.com
>>>>>>  
>>>>>> <https://groups.google.com/d/msgid/elasticsearch/4095e5b7-1fb4-477a-b27f-3e4519ab9000%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> -- 
>>>>> Adrien Grand
>>>>>  
>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3d5fba22-1f19-48b0-bce7-062cad407c01%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

elasticssearchcustom.jar
Description: application/java-archive

Re: Support for case insensitive sorts with doc values

Reply via email to