Alright, try this on for size. :-)
Since the built-in regex-ish filters want to be all clever and index-based,
why not use the JS script plugin, which is happy to run as a
post-processing phase?
curl -s -XGET 'http://127.0.0.1:9200/dxr_test/line/_search?pretty' -d '{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"and": [
{
"query": {
"match_phrase": {
"content_trg": "Children"
}
}
},
{
"query": {
"match_phrase": {
"content_trg": "Next"
}
}
},
{
"script": {
"lang": "js",
"script": "(new
RegExp(pattern)).test(doc[\"content\"].value)",
"params": {
"pattern": "Children.*Next"
}
}
}
]
}
}
}
}'
That gets me through the whole 16M-doc corpus in 117ms. (Without the
match_phrase queries, it takes forever, at 12s, so you can see the trigrams
acceleration working.) I am ecstatic.
Some of you might note that the pattern doesn't begin or end with a
wildcard; that's because RegExp.test() serves as a search rather than a
match, so wildcards are effectively assumed.
Cheers!
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/6a7674d8-bf51-4be6-860e-589db3939ea8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.