Luiz, thanks for responding!

I had forgotten to mention I tried not_analyzed as well. The analyzer it
turns out wasn't my problem.

I had 2 problems. First, the ES/Lucene regexp query/filter doesn't support
"\d" for indicating digits. So I had to replace them with the [0-9]
character class. Once I changed my regex to: "
http://example.com/([0-9]{4})/([0-9]{2})/([0-9]{2})/[^/]+/" it worked!

My second problem is it appears the python library has a bug. When I try
the following python using elasticsearch-py:

query = {
    "query": {
        "regexp": {
            "url": "http://example.com/([0-9]{4})/([0-9]{2})/([0-9]{2})/[
^/]+/"
        }
    }
}
es.search(index="regex-test",doc_type="test1", body=query)

I get:
{u'_shards': {u'failed': 0, u'successful': 5, u'total': 5}, u'hits':
{u'hits': [], u'max_score': None, u'total': 0}, u'timed_out': False,
u'took': 11}

However, when I do this query on the command line:

curl -XPOST "http://localhost:9200/regex-test/type1/_search"; -d'
> {
>     "query": {
>         "regexp": {
>             "url": "http://example.com/([0-9]{4})/([0-9]{2})/([0-9]{2})/[
^/]+/"
>         }
>     }
> }'

{"took":6,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":1.0,"hits":[{"_index":"regex-test","_type":"type1","_id":"doc1","_score":1.0,
"_source" : {"url":"http://example.com/2014/04/15/foo-bar-baz/"}

So I guess the issue lies with elasticsearch-py?


On Tue, Apr 15, 2014 at 5:59 PM, Luiz Guilherme Pais dos Santos <
[email protected]> wrote:

> Hi Matt,
>
> If you mark your field as not_analyzed:
> {
>     "mappings": {
>         "type1": {
>             "properties": {
>                 "url": {
>                     "type": "string",
>                     "index": "not_analyzed"
>                 }
>             }
>         }
>     }
> }
>
> You could use a regexp query:
> POST _search
> {
>     "query": {
>         "regexp": {
>             "url": "http://example\.com/\d{4}/\d{2}/\d{2}/([^/]+)/$"
>         }
>     }
> }
>
>
>
> On Tue, Apr 15, 2014 at 5:57 PM, matt burton <[email protected]> wrote:
>
>> I have a field in my documents that consists of a URL.
>> {...
>> "url":"http://example.com/2014/04/15/foo-bar-baz/";
>> ...}
>>
>> I would like to use a regexp query/filter to find documents in my index
>> with urls matching a regex pattern.
>> For example: "http://example\.com/\d{4}/\d{2}/\d{2}/([^/]+)/$"
>>
>> I'm a bit stumped about how to configure an analyzer in the document
>> _mapping to enable a regexp search (like above) for the url field. I've
>> tried the standard and keyword analyzer, but they didn't work.
>>
>> I'm not even sure if this is possible to do, if not I'll can do it
>> outside of ES, but I thought I'd ask here to see if ya'll had any guidance.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>>
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/62e05ecc-500f-474e-a5e6-220a9eb86eb3%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/62e05ecc-500f-474e-a5e6-220a9eb86eb3%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
>
> --
> Luiz Guilherme P. Santos
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/4_Hz3ivP4uo/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAMdL%3DZGwrZWON6tKoZDf4d0BOenDJDNyxaU0HfUOOV83%2Bh9KKA%40mail.gmail.com<https://groups.google.com/d/msgid/elasticsearch/CAMdL%3DZGwrZWON6tKoZDf4d0BOenDJDNyxaU0HfUOOV83%2Bh9KKA%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CA%2B0EHHrZ%2B%3DDqRk57fc9%3D26gVqALKqBjqd2BVz3%3D-8cgP26GEWg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to