Also note that the content returned will still contain the stop words. Only
the inverted index will contain the stopword-less content.

-- 
Ivan


On Thu, Aug 28, 2014 at 11:55 AM, Itamar Syn-Hershko <[email protected]>
wrote:

> What would be the usecase for such a process (removing stop words without
> tokenization)?
>
> This may be a good read btw:
> http://www.elasticsearch.org/blog/stop-stopping-stop-words-a-look-at-common-terms-query/
>
> --
>
> Itamar Syn-Hershko
> http://code972.com | @synhershko <https://twitter.com/synhershko>
> Freelance Developer & Consultant
> Author of RavenDB in Action <http://manning.com/synhershko/>
>
>
> On Thu, Aug 28, 2014 at 9:48 PM, German Carrillo <
> [email protected]> wrote:
>
>> Hi all,
>>
>>
>> I'm looking for a way to remove stop words from tokens returned by a
>> keyword tokenizer, i.e., I'd like to obtain the original text without stop
>> words after the analysis process.
>>
>> Sample data looks like:                         "El corregimiento de
>> Mulaló, jurisdicción del municipio de Yumbo (Valle del Cauca)"
>> After the lowercase token filter:           "el corregimiento de mulaló,
>> jurisdicción del municipio de yumbo (valle del cauca)"
>> After the ascii folding token filter:        "el corregimiento de mulalo,
>> jurisdiccion del municipio de yumbo (valle del cauca)"
>> After removing stop words:                   "corregimiento mulalo,
>> municipio yumbo (valle cauca)"
>>
>> The stop words (currently) are:      ["la", "el", "de", "del", "los",
>> "las", "jurisdiccion"]
>>
>> Is the pattern replace token filter the only (or best) way to go for such
>> a task?
>>
>> I'd really like to avoid writing custom regular expressions rather than
>> specifying a stop words list, which I know would work perfectly fine for
>> other tokenizers.
>>
>>
>> Regards,
>>
>> Germán
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/038ff037-ccf3-4aca-b0c0-bb421531c495%40googlegroups.com
>> <https://groups.google.com/d/msgid/elasticsearch/038ff037-ccf3-4aca-b0c0-bb421531c495%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAHTr4Zu%2BJGsL7Srsg7inbs3TkejOqp4jFZ1op-18WfiT3VoGOQ%40mail.gmail.com
> <https://groups.google.com/d/msgid/elasticsearch/CAHTr4Zu%2BJGsL7Srsg7inbs3TkejOqp4jFZ1op-18WfiT3VoGOQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCJAM-4nJAKjUix7GvT9766%2B5si_z76txfnt-S-BTJqBw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to