On 2024-03-05 05:40:46 (+0800), Sebastian Nielsen via mailop wrote:
Anyone that have a general algoritm to filter out emoji from sender
addresses?

How I do in regexp to identify emoji? (its such a stupid thing)..

Today's regular expression will not capture tomorrow's emoji. The nice people who standardise Unicode keep allocating more code points to more characters.

A guy sent a email containing emoji in the name part of a email sender
address in MIME FROM (like: Name [EMOJI] <u...@example.org>). This caused a few email clients to crash completely and being unable to reopen until I had
deleted the offending email from the inbox manually in the server.

So now I need to construct a rule to delete all emoji from both From: header
and To: header.

You have constructed a textbook example of an "XY problem".

Im thinking to do same as I do when I filter emoji from subject lines, but this will also filter out umlaits from people's names so "André Andersson" becomes "Andr Andersson" and "Recep Tayyip Erdoğan" would become "Recep
Tayyip Erdoan".

Which isn't a good thing to do.

How do you deal with users who write in 漢字 or देवनागरी?

So I need a rule to filter it more specifically, just delete all emoji but
not other Unicode like characters and names from other countries.

That is not a sustainable solution.

Replacing the clients that crash seems much easier.

Philip
_______________________________________________
mailop mailing list
mailop@mailop.org
https://list.mailop.org/listinfo/mailop

Reply via email to