Hi everyone, I'm Sepehr, the author of this proposal. I'm glad to see the interest in grapheme_mask().
I have already developed a working prototype in C (based on ICU ubrk) along with several PHPT test cases covering Unicode and emoji clusters. I believe this addition will significantly improve how developers handle sensitive data masking in modern PHP applications. I have requested a Wiki account to start the formal RFC process and share the implementation details. Looking forward to your feedback. Best regards, Sepehr در تاریخ جمعه ۱۹ ژوئن ۲۰۲۶، ۱۸:۱۷ youkidearitai <[email protected]> نوشت: > 2026年6月19日(金) 19:54 سپهر محمودی <[email protected]>: > > > > Hello everyone > > Over the past few weeks I have been exploring a common pattern that > > frequently appears in PHP applications: masking sensitive parts of > > strings such as credit card numbers, email addresses, phone numbers, > > and personal identifiers. > > > > In many real-world codebases, developers typically implement masking > > using combinations of functions like substr(), strlen(), str_repeat(), > > substr_replace(), or their multibyte equivalents. While these > > approaches work, they often lead to repetitive, error‑prone, and > > sometimes inefficient user‑land implementations. Handling edge > > cases—especially when offsets are negative, lengths are omitted, or > > when working with Unicode text—can make these snippets unnecessarily > > complex. > > > > While thinking about this problem, I designed a function concept > > called grapheme_mask(). The goal of this function is to provide a > > clear, native, and Unicode‑safe way to mask sections of a string. > > > > The key idea is that the function operates on grapheme clusters, > > rather than raw bytes or individual code points. This allows it to > > correctly handle modern Unicode text, including composed characters > > and emoji sequences, without breaking them apart. > > > > Conceptually, the function replaces a range of grapheme clusters with > > a masking string. > > > > Example: > > > > grapheme_mask("[email protected]", "*", 2, -12); > > // result: se****@example.com > > -------------------------------------------- > > Example with emoji sequences: > > grapheme_mask("👨🏽👩👧👦 family", "*", 0, 1); > > // result: * family > > ----------------------------------------- > > > > The intention is not to replace existing string functions, but to > > provide a dedicated and expressive helper for a task that developers > > routinely implement themselves. > > > > If there is interest from the community, I would be happy to draft a > > formal RFC describing the proposed behavior, edge cases, and potential > > implementation details. > > > > I would greatly appreciate any feedback, thoughts, or suggestions. > > > > Best regards, > > > > Sepehr > > Hi, Sepehr and Internals > > Thank you for bringing up discussion. > Looks good to me. > > One more point for add that function. > The diacritical mark sometimes includes one code point and separated > code points. > For example, Umlaut(ä, a + ¨), Dakuten(が, か + ゛) and etc in the world. > These characters needs support for grapheme_mask function. > Therefore, I would like need that function. > > Regards > Yuya > > > -- > --------------------------- > Yuya Hamada (tekimen) > - https://tekitoh-memdhoi.info > - https://github.com/youkidearitai > ----------------------------- >
