Re: Trigram-accelerated regex searches

Erik Rose Thu, 22 May 2014 13:31:34 -0700

Alright, try this on for size. :-)

Since the built-in regex-ish filters want to be all clever and index-based, 
why not use the JS script plugin, which is happy to run as a 
post-processing phase?


    curl -s -XGET 'http://127.0.0.1:9200/dxr_test/line/_search?pretty' -d '{
        "query": {
            "filtered": {
                "query": {
                    "match_all": {}
                },
                "filter": {
                    "and": [
                        {
                            "query": {
                                "match_phrase": {
                                    "content_trg": "Children"
                                }
                            }
                        },
                        {
                            "query": {
                                "match_phrase": {
                                    "content_trg": "Next"
                                }
                            }
                        },
                        {
                            "script": {
                                "lang": "js",
                                "script": "(new 
RegExp(pattern)).test(doc[\"content\"].value)",
                                "params": {
                                    "pattern": "Children.*Next"
                                }
                            }
                        }
                    ]
                }
            }
        }
    }'

That gets me through the whole 16M-doc corpus in 117ms. (Without the 
match_phrase queries, it takes forever, at 12s, so you can see the trigrams 
acceleration working.) I am ecstatic.

Some of you might note that the pattern doesn't begin or end with a 
wildcard; that's because RegExp.test() serves as a search rather than a 
match, so wildcards are effectively assumed.

Cheers!

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6a7674d8-bf51-4be6-860e-589db3939ea8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Trigram-accelerated regex searches

Reply via email to