Also note that the content returned will still contain the stop words. Only the inverted index will contain the stopword-less content.
-- Ivan On Thu, Aug 28, 2014 at 11:55 AM, Itamar Syn-Hershko <[email protected]> wrote: > What would be the usecase for such a process (removing stop words without > tokenization)? > > This may be a good read btw: > http://www.elasticsearch.org/blog/stop-stopping-stop-words-a-look-at-common-terms-query/ > > -- > > Itamar Syn-Hershko > http://code972.com | @synhershko <https://twitter.com/synhershko> > Freelance Developer & Consultant > Author of RavenDB in Action <http://manning.com/synhershko/> > > > On Thu, Aug 28, 2014 at 9:48 PM, German Carrillo < > [email protected]> wrote: > >> Hi all, >> >> >> I'm looking for a way to remove stop words from tokens returned by a >> keyword tokenizer, i.e., I'd like to obtain the original text without stop >> words after the analysis process. >> >> Sample data looks like: "El corregimiento de >> Mulaló, jurisdicción del municipio de Yumbo (Valle del Cauca)" >> After the lowercase token filter: "el corregimiento de mulaló, >> jurisdicción del municipio de yumbo (valle del cauca)" >> After the ascii folding token filter: "el corregimiento de mulalo, >> jurisdiccion del municipio de yumbo (valle del cauca)" >> After removing stop words: "corregimiento mulalo, >> municipio yumbo (valle cauca)" >> >> The stop words (currently) are: ["la", "el", "de", "del", "los", >> "las", "jurisdiccion"] >> >> Is the pattern replace token filter the only (or best) way to go for such >> a task? >> >> I'd really like to avoid writing custom regular expressions rather than >> specifying a stop words list, which I know would work perfectly fine for >> other tokenizers. >> >> >> Regards, >> >> Germán >> >> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/038ff037-ccf3-4aca-b0c0-bb421531c495%40googlegroups.com >> <https://groups.google.com/d/msgid/elasticsearch/038ff037-ccf3-4aca-b0c0-bb421531c495%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/CAHTr4Zu%2BJGsL7Srsg7inbs3TkejOqp4jFZ1op-18WfiT3VoGOQ%40mail.gmail.com > <https://groups.google.com/d/msgid/elasticsearch/CAHTr4Zu%2BJGsL7Srsg7inbs3TkejOqp4jFZ1op-18WfiT3VoGOQ%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCJAM-4nJAKjUix7GvT9766%2B5si_z76txfnt-S-BTJqBw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
