Hello everyone,

We use currently ES to index documents with the mapper-attachments plugin.
Sometimes, files that we index can contain some special characters, such as 
special symbols (ex : cellphone symbol) added in word.
In our case, special characters have nothing to do with language, but only 
graphical symbol in word.

Finally, the search results will return things like
 \n� <em>000</em> 123 456 \n


Since indexed files are encoded in base64 and stored directly in ES without 
any copy, I don't think we shall filter the files before its storage in ES. 
(Otherwise we can't retrieve the same document as it was before indexing)

Maybe we should try to clean the search result by eliminating these 
unreadable characters.

Do you have some ideas please?

Thank you very much.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/cd988a84-2c64-425b-b522-86f74311390e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to