Great, the ParseContext looks promising. We'll try it and report back, thanks!
Jakub BTW, just to answer your previous implicit question - SIREn allows for advanced structured document search, more at http://sirendb.com/ On Friday, May 23, 2014 6:56:32 PM UTC+1, Jörg Prante wrote: > > In answer to (1), in each custom mapper, you have access to ParseContext > in the method > > public void parse(ParseContext context) throws IOException > > In the ParseContext, you can access _source with the source() method to do > whatever you want, e.g. copy it, parse it, index it again etc. > > (2) is a slight misconception, since _source is not a field, but a "field > container", it is a byte array passed through the ES API so the field > mappers can do their work. > > (3) as said, it is possible to copy _source, but only internally in the > code of a custom field mapper, not by configuration in the mapping, since > _source is reserved for special treatment inside ES and users should not be > able to tamper with it. > > So a customized mapper in a plugin could work like this in the root object: > > "mappings" : { > "properties" : { > ... > "_siren" : { "type" : "siren" } > } > } > > and in the corresponding code in the custom mapper, when field _siren is > processed because of the type "siren", it copies the byte array from > _source in the ParseContext. (It need not to be the field name _siren this > is just an example name) > > Jörg > > > > > On Fri, May 23, 2014 at 5:38 PM, Jakub Kotowski > <[email protected]<javascript:> > > wrote: > >> Hi Jörg, >> >> thanks for the reply. Yes, what you suggest is a way to improve our >> current approach so that we can get a subdoc instead of a json encoded in a >> string field. >> >> What we would like to achieve is to always be able to process any >> document that comes to elasticsearch as a whole, i.e. be it { "title": "my >> title", "content" : "my content"} or {"name" : "john", "surname" : "doe"}. >> >> For that we either (1) need to be able to set an analyzer for the whole >> input document or (2) set an analyzer for the _source field which already >> contains the whole doc or (3) copy the _source field to a normal field, >> let's say _siren, and set an analyzer for it. >> >> (1) and (2) seem to be impossible. >> >> So we are exploring option (3) which also seems difficult. >> >> Jakub >> >> >> On Friday, May 23, 2014 4:24:39 PM UTC+1, Jörg Prante wrote: >> >>> Not sure what the plugin is doing, but if you want to process dedicated >>> JSON data in an ES document, you could prepare an analyzer for a new field >>> type. So user can assign special meaning in the mapping to a field of their >>> preference. >>> >>> E.g. a mapping with >>> >>> "mappings: { >>> "mycontent" : { "type" : "siren" } >>> } >>> >>> and a given document would look like >>> >>> "mycontent" : { >>> "title" : "foo", >>> "name" : "bar" >>> ... >>> } >>> >>> >>> and then you could extract the whole JSON subdoc from the doc under >>> "mycontent" into your analyzer plugin and process it. >>> >>> For an example, you could look into plugins like the StandardNumber >>> analyzer, where I defined a new type "standardnumber" for analysis: >>> >>> https://github.com/jprante/elasticsearch-analysis- >>> standardnumber/blob/master/src/main/java/org/xbib/ >>> elasticsearch/index/mapper/standardnumber/StandardNumberMapper.java >>> >>> Jörg >>> >>> >>> >>> On Fri, May 23, 2014 at 4:48 PM, Jakub Kotowski >>> <[email protected]>wrote: >>> >>>> Hello all, >>>> >>>> we are trying to implement a SIREn plugin for ElasticSearch for >>>> indexing and querying documents. We already implemented a version which >>>> uses SIREn to index and query a specific field (called "contents" below) >>>> which contains a JSON document as a string. An example of a doc: >>>> >>>> { >>>> "id":3, >>>> "contents":"{\"title\":\"This is an another article >>>> about SIREn.\",\"content\":\"bla bla bla \"}" >>>> } >>>> >>>> >>>> Instead, we would like to index the whole document as it is posted to >>>> ElasticSearch to avoid the need for a special loader that transforms an >>>> input JSON to the required form. So then the user would simply post a >>>> document such as: >>>> >>>> { >>>> "id":3, >>>> "title":"This is an another article about SIREn.", >>>> "content": "bla bla bla " >>>> } >>>> >>>> and it would be indexed as a whole both by ElasticSearch and by the >>>> SIREn plugin. >>>> >>>> One problem we encountered is that it is not possible to use copyTo for >>>> the _source field and then only configure an analyzer for the copy. >>>> >>>> It seems that the cleanest solution would be to modify the >>>> SourceFieldMapper class to allow copyTo. >>>> >>>> As a workaround we are going to create a class that extends >>>> SourceFieldMapper and set copyTo for the _source field to a new field that >>>> will be then used for SIREn and register it as follows: >>>> >>>> mapperService.documentMapperParser().putRootTypeParser("_source", new >>>> ModifiedSourceFieldMapper.TypeParser()); >>>> >>>> Does it sound OK or is there a simpler/cleaner solution? >>>> >>>> Thank you in advance, >>>> >>>> Jakub >>>> >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "elasticsearch" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> >>>> To view this discussion on the web visit https://groups.google.com/d/ >>>> msgid/elasticsearch/352e7668-d382-4ca3-bbeb-605d6c019ed1% >>>> 40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/352e7668-d382-4ca3-bbeb-605d6c019ed1%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/e796d820-8e9c-4f94-b425-38bd5f509b51%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/e796d820-8e9c-4f94-b425-38bd5f509b51%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e30a414b-fb6f-4759-a80f-0e4ac3bf96ea%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
