I used the following two commands to check for non-ASCII characters within the codebase:
find . -name "*.h" -exec grep --color='auto' -P -n "[\x80-\xFF]" {} \; find . -name "*.c" -exec grep --color='auto' -P -n "[\x80-\xFF]" {} \; The problematic characters are very few. I could only see two names, one of which is not printed correctly on my system (I believe). I saw some © characters, which according to wikipedia ( https://en.wikipedia.org/wiki/Copyright_symbol#Typing_the_character) it is OK to write as (C). And then there are some files that contain an unreadable mess within the comments. See attached. [image: image.png] On Mon, Oct 10, 2022 at 6:49 PM Fotis Panagiotopoulos <f.j.pa...@gmail.com> wrote: > > nxstyle should only complain if this is a source or build file, right? > > And only if if the unicode is outside of a comment. Unicode characters > > are useful in .txtf, .md, a probably other file typles and also in code > > comments. > > Of course I am talking strictly about .h/.c files. Documentation can use > anything. > However, I would also include comments. > > > > Some of these are are people's names or in documentation, I don't see any > > reason to update that. > > These can be left as is. Although, I believe that even names shall use > ASCII letters only. > For example, my name is Φώτης. I wouldn't expect anyone else on this > project to actually know how to pronounce or write this... > Using the latin letters "Fotis" makes it much more "user-friendly". > > > On Mon, Oct 10, 2022 at 6:40 PM Gregory Nutt <spudan...@gmail.com> wrote: > >> nxstyle should only complain if this is a source or build file, right? >> And only if if the unicode is outside of a comment. Unicode characters >> are useful in .txtf, .md, a probably other file typles and also in code >> comments. >> >> There are flags in nxstyle that tells if you the type of file (by >> extension) and if nxstyle is parsing within a comment. >> >> On 10/10/2022 9:33 AM, Fotis Panagiotopoulos wrote: >> > Shall I enhance nxstyle to check for this? Is this the correct place for >> > this check? >> > >> > On Mon, Oct 10, 2022 at 6:30 PM Alin Jerpelea <jerpe...@gmail.com> >> wrote: >> > >> >> Let's remove them! >> >> >> >> Thanks for looking into this issue >> >> >> >> Best Regards >> >> Alin >> >> >> >> On Mon, 10 Oct 2022, 17:25 Alan C. Assis, <acas...@gmail.com> wrote: >> >> >> >>> Agree! It is better to avoid it. >> >>> >> >>> On 10/10/22, Fotis Panagiotopoulos <f.j.pa...@gmail.com> wrote: >> >>>> Hello! >> >>>> >> >>>> A few weeks ago I had some problems with a static analysis tool that >> >>>> couldn't parse NuttX code, due to non-Unicode characters. I provided >> a >> >>>> couple of PRs and fixed the issues, but it got me thinking... >> >>>> >> >>>> Do we really need Unicode characters within the codebase? >> >>>> >> >>>> I can only think of problems with this, from missing glyphs from >> fonts, >> >>> to >> >>>> difficulties in search... >> >>>> I don't see any value in writing μs instead of us, or I²C instead of >> >> I2C. >> >>>> What do you think? >> >>>> Shall we allow such characters, or enforce ASCII-only characters in >> the >> >>>> codebase? >> >>>> >> >>