Populating TermVector when having tokenizer outside ES

Neeraj Makam Tue, 29 Apr 2014 05:05:26 -0700

Hi,

I have a mapping in which there is a nested list of words (which is 
generated by a tokenizer residing outside ES). Each word has fields 
'token_offset' and 'character_offset' which is populated by my tokenizer. 
This is the mapping i am using (say):


{
        "contract": {
           "_id" : {
            "path" : "objectId"
},
            "properties": {
                "filepath": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "objectId": {
                    "type": "string",
                    "index": "no"
                },
                "*words*": {
                    "type": "*nested*",
                    "properties": {
                        "*characterOffset*": {
                            "type": "long",
                            "index": "no"
                        },
                        "*wordType*": {
                            "type": "long"
                        },
                        "*tokenOffset*": {
                            "type": "long",
                            "index": "no"
                        },
                        "*value*": {
                            "type": "string",
                            "index": "not_analyzed"
                        }
                    }
                }
            }
        }
}

I want to be able to do a query which says: 
"value" == "foo" AND "wordType" == 5.
This made me map the list "words" as nested. [1]

For eg:
if the text is "*this is foo and bar*", my tokenizer separates out each 
word and associates wordType for each word, and also generates 
characterOffset and tokenOffset.
i.e 
word[0].value = "this"
word[0].wordType = 5
word[0].characterOffset = 0
word[0].tokenOffset = 0

Now how do i populate the termvector of ES so as to leverage its phrase 
search and other features such as "AND/OR/NEAR" etc?? [2]

*[1] - Is there a way i can implement this without using the concept of 
nested (because this will separate out each word into a separate document)*
*[2] - Can a custom analyzer be used to populate the term vector of ES 
while having the tokenizer outside ES (assuming due to business necessity, 
moving the tokenizer inside ES is not feasible).*
//The feature i need to implement is *phrase search, supporting AND/OR/NEAR 
and highlighting.*

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7c363b2d-3dc0-4096-8d47-ab70ee20d181%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Populating TermVector when having tokenizer outside ES

Reply via email to