Luiz, thanks for responding! I had forgotten to mention I tried not_analyzed as well. The analyzer it turns out wasn't my problem.
I had 2 problems. First, the ES/Lucene regexp query/filter doesn't support "\d" for indicating digits. So I had to replace them with the [0-9] character class. Once I changed my regex to: " http://example.com/([0-9]{4})/([0-9]{2})/([0-9]{2})/[^/]+/" it worked! My second problem is it appears the python library has a bug. When I try the following python using elasticsearch-py: query = { "query": { "regexp": { "url": "http://example.com/([0-9]{4})/([0-9]{2})/([0-9]{2})/[ ^/]+/" } } } es.search(index="regex-test",doc_type="test1", body=query) I get: {u'_shards': {u'failed': 0, u'successful': 5, u'total': 5}, u'hits': {u'hits': [], u'max_score': None, u'total': 0}, u'timed_out': False, u'took': 11} However, when I do this query on the command line: curl -XPOST "http://localhost:9200/regex-test/type1/_search" -d' > { > "query": { > "regexp": { > "url": "http://example.com/([0-9]{4})/([0-9]{2})/([0-9]{2})/[ ^/]+/" > } > } > }' {"took":6,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":1.0,"hits":[{"_index":"regex-test","_type":"type1","_id":"doc1","_score":1.0, "_source" : {"url":"http://example.com/2014/04/15/foo-bar-baz/"} So I guess the issue lies with elasticsearch-py? On Tue, Apr 15, 2014 at 5:59 PM, Luiz Guilherme Pais dos Santos < [email protected]> wrote: > Hi Matt, > > If you mark your field as not_analyzed: > { > "mappings": { > "type1": { > "properties": { > "url": { > "type": "string", > "index": "not_analyzed" > } > } > } > } > } > > You could use a regexp query: > POST _search > { > "query": { > "regexp": { > "url": "http://example\.com/\d{4}/\d{2}/\d{2}/([^/]+)/$" > } > } > } > > > > On Tue, Apr 15, 2014 at 5:57 PM, matt burton <[email protected]> wrote: > >> I have a field in my documents that consists of a URL. >> {... >> "url":"http://example.com/2014/04/15/foo-bar-baz/" >> ...} >> >> I would like to use a regexp query/filter to find documents in my index >> with urls matching a regex pattern. >> For example: "http://example\.com/\d{4}/\d{2}/\d{2}/([^/]+)/$" >> >> I'm a bit stumped about how to configure an analyzer in the document >> _mapping to enable a regexp search (like above) for the url field. I've >> tried the standard and keyword analyzer, but they didn't work. >> >> I'm not even sure if this is possible to do, if not I'll can do it >> outside of ES, but I thought I'd ask here to see if ya'll had any guidance. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/62e05ecc-500f-474e-a5e6-220a9eb86eb3%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/62e05ecc-500f-474e-a5e6-220a9eb86eb3%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > > > -- > Luiz Guilherme P. Santos > > -- > You received this message because you are subscribed to a topic in the > Google Groups "elasticsearch" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/elasticsearch/4_Hz3ivP4uo/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/CAMdL%3DZGwrZWON6tKoZDf4d0BOenDJDNyxaU0HfUOOV83%2Bh9KKA%40mail.gmail.com<https://groups.google.com/d/msgid/elasticsearch/CAMdL%3DZGwrZWON6tKoZDf4d0BOenDJDNyxaU0HfUOOV83%2Bh9KKA%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CA%2B0EHHrZ%2B%3DDqRk57fc9%3D26gVqALKqBjqd2BVz3%3D-8cgP26GEWg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
