Re: Unicode characters in codebase.

Fotis Panagiotopoulos Mon, 10 Oct 2022 09:05:42 -0700

I used the following two commands to check for non-ASCII characters within
the codebase:


find . -name "*.h" -exec grep --color='auto' -P -n "[\x80-\xFF]" {} \;
find . -name "*.c" -exec grep --color='auto' -P -n "[\x80-\xFF]" {} \;

The problematic characters are very few.

I could only see two names, one of which is not printed correctly on my
system (I believe).
I saw some © characters, which according to wikipedia (
https://en.wikipedia.org/wiki/Copyright_symbol#Typing_the_character) it is
OK to write as (C).

And then there are some files that contain an unreadable mess within the
comments.
See attached.

[image: image.png]






On Mon, Oct 10, 2022 at 6:49 PM Fotis Panagiotopoulos <f.j.pa...@gmail.com>
wrote:

> > nxstyle should only complain if this is a source or build file, right?
> > And only if if the unicode is outside of a comment.  Unicode characters
> > are useful in .txtf, .md, a probably other file typles and also in code
> > comments.
>
> Of course I am talking strictly about .h/.c files. Documentation can use
> anything.
> However, I would also include comments.
>
>
> > Some of these are are people's names or in documentation, I don't see any
> > reason to update that.
>
> These can be left as is. Although, I believe that even names shall use
> ASCII letters only.
> For example, my name is Φώτης. I wouldn't expect anyone else on this
> project to actually know how to pronounce or write this...
> Using the latin letters "Fotis" makes it much more "user-friendly".
>
>
> On Mon, Oct 10, 2022 at 6:40 PM Gregory Nutt <spudan...@gmail.com> wrote:
>
>> nxstyle should only complain if this is a source or build file, right?
>> And only if if the unicode is outside of a comment.  Unicode characters
>> are useful in .txtf, .md, a probably other file typles and also in code
>> comments.
>>
>> There are flags in nxstyle that tells if you the type of file (by
>> extension) and if nxstyle is parsing within a comment.
>>
>> On 10/10/2022 9:33 AM, Fotis Panagiotopoulos wrote:
>> > Shall I enhance nxstyle to check for this? Is this the correct place for
>> > this check?
>> >
>> > On Mon, Oct 10, 2022 at 6:30 PM Alin Jerpelea <jerpe...@gmail.com>
>> wrote:
>> >
>> >> Let's remove them!
>> >>
>> >> Thanks for looking into this issue
>> >>
>> >> Best Regards
>> >> Alin
>> >>
>> >> On Mon, 10 Oct 2022, 17:25 Alan C. Assis, <acas...@gmail.com> wrote:
>> >>
>> >>> Agree! It is better to avoid it.
>> >>>
>> >>> On 10/10/22, Fotis Panagiotopoulos <f.j.pa...@gmail.com> wrote:
>> >>>> Hello!
>> >>>>
>> >>>> A few weeks ago I had some problems with a static analysis tool that
>> >>>> couldn't parse NuttX code, due to non-Unicode characters. I provided
>> a
>> >>>> couple of PRs and fixed the issues, but it got me thinking...
>> >>>>
>> >>>> Do we really need Unicode characters within the codebase?
>> >>>>
>> >>>> I can only think of problems with this, from missing glyphs from
>> fonts,
>> >>> to
>> >>>> difficulties in search...
>> >>>> I don't see any value in writing μs instead of us, or I²C instead of
>> >> I2C.
>> >>>> What do you think?
>> >>>> Shall we allow such characters, or enforce ASCII-only characters in
>> the
>> >>>> codebase?
>> >>>>
>>
>>

Re: Unicode characters in codebase.

Reply via email to