Re: implementing a plugin to process the whole input document

Jakub Kotowski Fri, 23 May 2014 08:39:05 -0700

Hi Jörg,

thanks for the reply. Yes, what you suggest is a way to improve our current 
approach so that we can get a subdoc instead of a json encoded in a string 
field.


What we would like to achieve is to always be able to process any document 
that comes to elasticsearch as a whole, i.e. be it { "title": "my title", 
"content" : "my content"} or {"name" : "john", "surname" : "doe"}.

For that we either (1) need to be able to set an analyzer for the whole 
input document or (2) set an analyzer for the _source field which already 
contains the whole doc or (3) copy the _source field to a normal field, 
let's say _siren, and set an analyzer for it.

(1) and (2) seem to be impossible.

So we are exploring option (3) which also seems difficult.

Jakub 


On Friday, May 23, 2014 4:24:39 PM UTC+1, Jörg Prante wrote:
>
> Not sure what the plugin is doing, but if you want to process dedicated 
> JSON data in an ES document, you could prepare an analyzer for a new field 
> type. So user can assign special meaning in the mapping to a field of their 
> preference.
>
> E.g.  a mapping with
>
>      "mappings: {
>          "mycontent" : { "type" : "siren" }
>     }
>
> and a given document would look like
>
>     "mycontent" : {
>          "title" : "foo",
>          "name" : "bar"
>          ...
>     }
>
>
> and then you could extract the whole JSON subdoc from the doc under 
> "mycontent" into your analyzer plugin and process it. 
>
> For an example, you could look into plugins like the StandardNumber 
> analyzer, where I defined a new type "standardnumber" for analysis:
>
>
> https://github.com/jprante/elasticsearch-analysis-standardnumber/blob/master/src/main/java/org/xbib/elasticsearch/index/mapper/standardnumber/StandardNumberMapper.java
>
> Jörg
>
>
>
> On Fri, May 23, 2014 at 4:48 PM, Jakub Kotowski 
> <[email protected]<javascript:>
> > wrote:
>
>> Hello all,
>>
>> we are trying to implement a SIREn plugin for ElasticSearch for indexing 
>> and querying documents. We already implemented a version which uses SIREn 
>> to index and query a specific field (called "contents" below) which 
>> contains a JSON document as a string. An example of a doc:
>>
>> {
>>    "id":3,
>>    "contents":
>> "{\"title\":\"This is an another article  about SIREn.\",\"content\":\"bla 
>> bla bla \"}"
>> }
>>  
>>
>> Instead, we would like to index the whole document as it is posted to 
>> ElasticSearch to avoid the need for a special loader that transforms an 
>> input JSON to the required form. So then the user would simply post a 
>> document such as:
>>
>> {
>>    "id":3,
>>    "title":"This is an another article  about SIREn.",
>>    "content": "bla bla bla "
>> }
>>
>> and it would be indexed as a whole both by ElasticSearch and by the SIREn 
>> plugin.
>>
>> One problem we encountered is that it is not possible to use copyTo for 
>> the _source field and then only configure an analyzer for the copy.
>>
>>  It seems that the cleanest solution would be to modify the 
>> SourceFieldMapper class to allow copyTo. 
>>
>>  As a workaround we are going to create a class that extends 
>> SourceFieldMapper and set copyTo for the _source field to a new field that 
>> will be then used for SIREn and register it as follows:
>>  
>> mapperService.documentMapperParser().putRootTypeParser("_source", new 
>> ModifiedSourceFieldMapper.TypeParser());
>>
>> Does it sound OK or is there a simpler/cleaner solution?
>>  
>> Thank you in advance,
>>
>> Jakub
>>
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/352e7668-d382-4ca3-bbeb-605d6c019ed1%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/352e7668-d382-4ca3-bbeb-605d6c019ed1%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e796d820-8e9c-4f94-b425-38bd5f509b51%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: implementing a plugin to process the whole input document

Reply via email to