Thanks Ivan,

do you mean what I obtain from a request such as

curl -XGET
'localhost:9200/_analyze?tokenizer=keyword&filters=lowercase,my_ascii_folding,my_stopwords'
-d 'El corregimiento de Mulaló, jurisdicción del municipio de Yumbo (Valle
del Cauca)'

is not what will be present in the index after the analysis process? If so,
how could I check whether the stop words filter is being (will be) applied
to a sample phrase?


2014-08-28 14:03 GMT-05:00 Ivan Brusic <[email protected]>:

> Also note that the content returned will still contain the stop words.
> Only the inverted index will contain the stopword-less content.
>
> --
> Ivan
>
>
> On Thu, Aug 28, 2014 at 11:55 AM, Itamar Syn-Hershko <[email protected]>
> wrote:
>
>> What would be the usecase for such a process (removing stop words without
>> tokenization)?
>>
>> This may be a good read btw:
>> http://www.elasticsearch.org/blog/stop-stopping-stop-words-a-look-at-common-terms-query/
>>
>> --
>>
>> Itamar Syn-Hershko
>> http://code972.com | @synhershko <https://twitter.com/synhershko>
>> Freelance Developer & Consultant
>> Author of RavenDB in Action <http://manning.com/synhershko/>
>>
>>
>> On Thu, Aug 28, 2014 at 9:48 PM, German Carrillo <
>> [email protected]> wrote:
>>
>>> Hi all,
>>>
>>>
>>> I'm looking for a way to remove stop words from tokens returned by a
>>> keyword tokenizer, i.e., I'd like to obtain the original text without stop
>>> words after the analysis process.
>>>
>>> Sample data looks like:                         "El corregimiento de
>>> Mulaló, jurisdicción del municipio de Yumbo (Valle del Cauca)"
>>> After the lowercase token filter:           "el corregimiento de mulaló,
>>> jurisdicción del municipio de yumbo (valle del cauca)"
>>> After the ascii folding token filter:        "el corregimiento de
>>> mulalo, jurisdiccion del municipio de yumbo (valle del cauca)"
>>> After removing stop words:                   "corregimiento mulalo,
>>> municipio yumbo (valle cauca)"
>>>
>>> The stop words (currently) are:      ["la", "el", "de", "del", "los",
>>> "las", "jurisdiccion"]
>>>
>>> Is the pattern replace token filter the only (or best) way to go for
>>> such a task?
>>>
>>> I'd really like to avoid writing custom regular expressions rather than
>>> specifying a stop words list, which I know would work perfectly fine for
>>> other tokenizers.
>>>
>>>
>>> Regards,
>>>
>>> Germán
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/elasticsearch/038ff037-ccf3-4aca-b0c0-bb421531c495%40googlegroups.com
>>> <https://groups.google.com/d/msgid/elasticsearch/038ff037-ccf3-4aca-b0c0-bb421531c495%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/CAHTr4Zu%2BJGsL7Srsg7inbs3TkejOqp4jFZ1op-18WfiT3VoGOQ%40mail.gmail.com
>> <https://groups.google.com/d/msgid/elasticsearch/CAHTr4Zu%2BJGsL7Srsg7inbs3TkejOqp4jFZ1op-18WfiT3VoGOQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCJAM-4nJAKjUix7GvT9766%2B5si_z76txfnt-S-BTJqBw%40mail.gmail.com
> <https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCJAM-4nJAKjUix7GvT9766%2B5si_z76txfnt-S-BTJqBw%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CANaz7mxuoDv3cV83nUgr-SXentuwfBcs3bX8oLMA_tvBd40bWA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to