I do all my HTML munging in the application that sends data to Elasticsearch. I know that isn't much help, but it does work.
On Tue, Oct 7, 2014 at 5:03 PM, Hermano Cabral < [email protected]> wrote: > Howdy, > > What would be the "best" way to strip hyperlinks (eg. http://google.com, > www.facebook.com, etc.) and avoid them being analyzed? So far I've been > using the *pattern_replace* char filter with reasonable success, but the > regex is getting quite big/complex to handle all the edge cases and even > tho we're still experimenting with ES, I'm starting to worry about the > performance impact of doing this when we start to ingest large volumes of > data into our ES cluster. Would the *pattern_replace* token filter be a > better option here? > > Cheers! > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/e1ba63a2-c8c1-4c65-811d-40c6b70fefd1%40googlegroups.com > <https://groups.google.com/d/msgid/elasticsearch/e1ba63a2-c8c1-4c65-811d-40c6b70fefd1%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd1Y_9C-R3cwv6P8kO36%3DOyGG0DnhJoLUv0LiaqyAEKb3Q%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
