What would be the "best" (optimal?) strategy to strip html links (hyperlinks) from string fields?

Hermano Cabral Tue, 07 Oct 2014 14:04:27 -0700

Howdy,

What would be the "best" way to strip hyperlinks (eg. http://google.com, 
www.facebook.com, etc.) and avoid them being analyzed? So far I've been 
using the *pattern_replace* char filter with reasonable success, but the 
regex is getting quite big/complex to handle all the edge cases and even 
tho we're still experimenting with ES, I'm starting to worry about the 
performance impact of doing this when we start to ingest large volumes of 
data into our ES cluster.  Would the *pattern_replace* token filter be a 
better option here?


Cheers!

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e1ba63a2-c8c1-4c65-811d-40c6b70fefd1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

What would be the "best" (optimal?) strategy to strip html links (hyperlinks) from string fields?

Reply via email to