Torsten Bögershausen <tbo...@web.de> writes:

> The function git_wcwidth() returns for a given unicode code point the
> width on the display:
> -1 for control characters,
>  0 for combining or other non-visible code points
>  1 for e.g. ASCII
>  2 for double-width code points.
>
> This table had been originally been extracted for one Unicode version,
> probably 3.2.
>
> Make it possible to update the to a later version of Unicode:
>
> Factor out the table from utf8.c into unicode_width.h and
> add the script update_unicode.sh to update the table based on the latest
> Unicode specification files.
>
> Thanks to
> Peter Krefting <pe...@softwolves.pp.se> and
> Kevin Bracey <ke...@bracey.fi>
> for helping with their Unicode knowledge
>
> Signed-off-by: Torsten Bögershausen <tbo...@web.de>
> ---

I would say this is a good idea, but a few nitpicks.

> diff --git a/unicode_width.h b/unicode_width.h
> new file mode 100644
> index 0000000..4db7803
> --- /dev/null
> +++ b/unicode_width.h
> @@ -0,0 +1,288 @@

Please update the script (and the resulting file) to caution against
misuse/mismanagement of this file by adding a comment to at least
state:

 - This is a generated file and you are not supposed to modify it; and

 - This is to be included only once from one place in the code and
   that is why this does not use the usual #ifndef X/#define X/#endif
   double-inclusion guards.

An obvious and viable alternative to the second would be to do the
usual double-inclusion guard.  I do not have much preference either
way.

> +static const struct interval zero_width[] = {
> ...
> +};
> +static const struct interval double_width[] = {
> ...
> +};
> diff --git a/update_unicode.sh b/update_unicode.sh
> new file mode 100755
> index 0000000..000b937
> --- /dev/null
> +++ b/update_unicode.sh
> @@ -0,0 +1,37 @@
> +#!/bin/sh
> +#See http://www.unicode.org/reports/tr44/
> +#
> +#Me Enclosing_Mark  an enclosing combining mark
> +#Mn Nonspacing_Mark a nonspacing combining mark (zero advance width)
> +#Cf Format          a format control character

Please have a SP after # in these comments to make them readable?

> +#
> +UNICODEWIDTH_H=../unicode_width.h
> +if ! test -d unicode; then
> +     mkdir unicode
> +fi &&

Style:

        if ! test -d unicode
        then
                ...
        fi

> +( cd unicode &&
> +     if ! test -f UnicodeData.txt; then
> +             wget 
> http://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt
> +     fi &&
> +     if ! test -f EastAsianWidth.txt; then
> +             wget 
> http://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt
> +     fi &&
> +     if ! test -d uniset; then
> +             git clone https://github.com/depp/uniset.git
> +     fi &&
> +     (
> +             cd uniset &&
> +             if ! test -x uniset; then
> +                     autoreconf -i &&
> +                     ./configure --enable-warnings=-Werror CFLAGS='-O0 -ggdb'

What are these "-O0 -ggdb" about?

> +             fi &&
> +             make
> +     ) &&
> +     echo "static const struct interval zero_width[] = {" >$UNICODEWIDTH_H &&
> +     UNICODE_DIR=. ./uniset/uniset --32 cat:Me,Mn,Cf + U+1160..U+11FF - 
> U+00AD |
> +     grep -v plane >>$UNICODEWIDTH_H &&
> +     echo "};" >>$UNICODEWIDTH_H &&
> +     echo "static const struct interval double_width[] = {" 
> >>$UNICODEWIDTH_H &&
> +     UNICODE_DIR=. ./uniset/uniset --32 eaw:F,W >>$UNICODEWIDTH_H &&
> +     echo "};" >>$UNICODEWIDTH_H
> +)
> @@ -147,7 +90,7 @@ static int git_wcwidth(ucs_char_t ch)
>               return -1;
>  
>       /* binary search in table of non-spacing characters */
> -     if (bisearch(ch, combining, sizeof(combining)
> +     if (bisearch(ch, zero_width, sizeof(zero_width)
>                               / sizeof(struct interval) - 1))

To my eyes, that line looks folded at a funny place.  I think it is
more conventional to fold after the operator, i.e.

        if (bisearch(ch, zero_width, sizeof(zero_width) /
                                sizeof(struct interval) - 1))

but

        if (bisearch(ch, zero_width,
                     sizeof(zero_width) / sizeof(struct interval) - 1))

may probably be a lot more logical and readable.  Maybe it is just
me.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to