Hi Jörg,
thanks for the reply. Yes, what you suggest is a way to improve our current
approach so that we can get a subdoc instead of a json encoded in a string
field.
What we would like to achieve is to always be able to process any document
that comes to elasticsearch as a whole, i.e. be it { "title": "my title",
"content" : "my content"} or {"name" : "john", "surname" : "doe"}.
For that we either (1) need to be able to set an analyzer for the whole
input document or (2) set an analyzer for the _source field which already
contains the whole doc or (3) copy the _source field to a normal field,
let's say _siren, and set an analyzer for it.
(1) and (2) seem to be impossible.
So we are exploring option (3) which also seems difficult.
Jakub
On Friday, May 23, 2014 4:24:39 PM UTC+1, Jörg Prante wrote:
>
> Not sure what the plugin is doing, but if you want to process dedicated
> JSON data in an ES document, you could prepare an analyzer for a new field
> type. So user can assign special meaning in the mapping to a field of their
> preference.
>
> E.g. a mapping with
>
> "mappings: {
> "mycontent" : { "type" : "siren" }
> }
>
> and a given document would look like
>
> "mycontent" : {
> "title" : "foo",
> "name" : "bar"
> ...
> }
>
>
> and then you could extract the whole JSON subdoc from the doc under
> "mycontent" into your analyzer plugin and process it.
>
> For an example, you could look into plugins like the StandardNumber
> analyzer, where I defined a new type "standardnumber" for analysis:
>
>
> https://github.com/jprante/elasticsearch-analysis-standardnumber/blob/master/src/main/java/org/xbib/elasticsearch/index/mapper/standardnumber/StandardNumberMapper.java
>
> Jörg
>
>
>
> On Fri, May 23, 2014 at 4:48 PM, Jakub Kotowski
> <[email protected]<javascript:>
> > wrote:
>
>> Hello all,
>>
>> we are trying to implement a SIREn plugin for ElasticSearch for indexing
>> and querying documents. We already implemented a version which uses SIREn
>> to index and query a specific field (called "contents" below) which
>> contains a JSON document as a string. An example of a doc:
>>
>> {
>> "id":3,
>> "contents":
>> "{\"title\":\"This is an another article about SIREn.\",\"content\":\"bla
>> bla bla \"}"
>> }
>>
>>
>> Instead, we would like to index the whole document as it is posted to
>> ElasticSearch to avoid the need for a special loader that transforms an
>> input JSON to the required form. So then the user would simply post a
>> document such as:
>>
>> {
>> "id":3,
>> "title":"This is an another article about SIREn.",
>> "content": "bla bla bla "
>> }
>>
>> and it would be indexed as a whole both by ElasticSearch and by the SIREn
>> plugin.
>>
>> One problem we encountered is that it is not possible to use copyTo for
>> the _source field and then only configure an analyzer for the copy.
>>
>> It seems that the cleanest solution would be to modify the
>> SourceFieldMapper class to allow copyTo.
>>
>> As a workaround we are going to create a class that extends
>> SourceFieldMapper and set copyTo for the _source field to a new field that
>> will be then used for SIREn and register it as follows:
>>
>> mapperService.documentMapperParser().putRootTypeParser("_source", new
>> ModifiedSourceFieldMapper.TypeParser());
>>
>> Does it sound OK or is there a simpler/cleaner solution?
>>
>> Thank you in advance,
>>
>> Jakub
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected] <javascript:>.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/352e7668-d382-4ca3-bbeb-605d6c019ed1%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/352e7668-d382-4ca3-bbeb-605d6c019ed1%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/e796d820-8e9c-4f94-b425-38bd5f509b51%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.