You mentioned in your original post "I'd like to obtain the original text without stop words"
The stopword-less phrase will indeed be present in the index after the analysis phrase, however, when you ask for this content back as a result of a query, the original text will be returned. What is indexed is not necessarily what is stored/returned. Cheers, Ivan On Thu, Aug 28, 2014 at 12:30 PM, Germán Carrillo <[email protected] > wrote: > Thanks Ivan, > > do you mean what I obtain from a request such as > > curl -XGET > 'localhost:9200/_analyze?tokenizer=keyword&filters=lowercase,my_ascii_folding,my_stopwords' > -d 'El corregimiento de Mulaló, jurisdicción del municipio de Yumbo > (Valle del Cauca)' > > is not what will be present in the index after the analysis process? If > so, how could I check whether the stop words filter is being (will be) > applied to a sample phrase? > > > 2014-08-28 14:03 GMT-05:00 Ivan Brusic <[email protected]>: > >> Also note that the content returned will still contain the stop words. >> Only the inverted index will contain the stopword-less content. >> >> -- >> Ivan >> >> >> On Thu, Aug 28, 2014 at 11:55 AM, Itamar Syn-Hershko <[email protected]> >> wrote: >> >>> What would be the usecase for such a process (removing stop words >>> without tokenization)? >>> >>> This may be a good read btw: >>> http://www.elasticsearch.org/blog/stop-stopping-stop-words-a-look-at-common-terms-query/ >>> >>> -- >>> >>> Itamar Syn-Hershko >>> http://code972.com | @synhershko <https://twitter.com/synhershko> >>> Freelance Developer & Consultant >>> Author of RavenDB in Action <http://manning.com/synhershko/> >>> >>> >>> On Thu, Aug 28, 2014 at 9:48 PM, German Carrillo < >>> [email protected]> wrote: >>> >>>> Hi all, >>>> >>>> >>>> I'm looking for a way to remove stop words from tokens returned by a >>>> keyword tokenizer, i.e., I'd like to obtain the original text without stop >>>> words after the analysis process. >>>> >>>> Sample data looks like: "El corregimiento de >>>> Mulaló, jurisdicción del municipio de Yumbo (Valle del Cauca)" >>>> After the lowercase token filter: "el corregimiento de >>>> mulaló, jurisdicción del municipio de yumbo (valle del cauca)" >>>> After the ascii folding token filter: "el corregimiento de >>>> mulalo, jurisdiccion del municipio de yumbo (valle del cauca)" >>>> After removing stop words: "corregimiento mulalo, >>>> municipio yumbo (valle cauca)" >>>> >>>> The stop words (currently) are: ["la", "el", "de", "del", "los", >>>> "las", "jurisdiccion"] >>>> >>>> Is the pattern replace token filter the only (or best) way to go for >>>> such a task? >>>> >>>> I'd really like to avoid writing custom regular expressions rather than >>>> specifying a stop words list, which I know would work perfectly fine for >>>> other tokenizers. >>>> >>>> >>>> Regards, >>>> >>>> Germán >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "elasticsearch" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/elasticsearch/038ff037-ccf3-4aca-b0c0-bb421531c495%40googlegroups.com >>>> <https://groups.google.com/d/msgid/elasticsearch/038ff037-ccf3-4aca-b0c0-bb421531c495%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "elasticsearch" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/elasticsearch/CAHTr4Zu%2BJGsL7Srsg7inbs3TkejOqp4jFZ1op-18WfiT3VoGOQ%40mail.gmail.com >>> <https://groups.google.com/d/msgid/elasticsearch/CAHTr4Zu%2BJGsL7Srsg7inbs3TkejOqp4jFZ1op-18WfiT3VoGOQ%40mail.gmail.com?utm_medium=email&utm_source=footer> >>> . >>> >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCJAM-4nJAKjUix7GvT9766%2B5si_z76txfnt-S-BTJqBw%40mail.gmail.com >> <https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCJAM-4nJAKjUix7GvT9766%2B5si_z76txfnt-S-BTJqBw%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> >> For more options, visit https://groups.google.com/d/optout. >> > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/CANaz7mxuoDv3cV83nUgr-SXentuwfBcs3bX8oLMA_tvBd40bWA%40mail.gmail.com > <https://groups.google.com/d/msgid/elasticsearch/CANaz7mxuoDv3cV83nUgr-SXentuwfBcs3bX8oLMA_tvBd40bWA%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCWTx%2B%2BSPvA_wzXoyP_jjzaaekGoRsCeb2zZ7ps55vYnA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
