On 13/03/18 17:06, Rafal Luzynski wrote: > As we have introduced the support of nominative and genitive > month names in glibc [1] and we are going to provide the updated > locale data for Catalan language [2] it has been discovered [3] > that the current limit of the maximum length of the abbreviated > month name as displayed by "ls -l" will not work with the new > data for Catalan. It is obligatory to precede the month name > with "de " (note: the space) so the abbreviated month names limited > to 5 characters will be ambiguous and therefore unreadable:
It's a bit surprising that _abbreviations_ all need the "de " prefix, but fair enough. > de ma (should be "de mar" at least) > d’abr (correct) > de ma (should be "de mai" at least) > de ju (should be "de jun" at least) > de ju (should be "de jul" at least) > > Increasing the value of MAX_MON_WIDTH to 6 characters will fix > the problem. The location of the constant is here: [4] > > Although it has been also suggested in the same bug report that > there should be no additional limit for the month length. > > This bug may be related with the coreutils bug #29377. [5] > > Regards, > > Rafal Luzynski > > > [1] https://sourceware.org/bugzilla/show_bug.cgi?id=10871 > [2] https://sourceware.org/bugzilla/show_bug.cgi?id=22848 > [3] https://sourceware.org/bugzilla/show_bug.cgi?id=22848#c6 > [4] http://git.savannah.gnu.org/cgit/coreutils.git/tree/src/ls.c#n1099 > [5] https://debbugs.gnu.org/cgi/bugreport.cgi?bug=29377 > > > > Thanks for the careful analysis. 5 was chosen as a max width for abmon as that was seen to be unambiguous and also truncate overly long abbreviations. One can browse the abbreviations by length using: locale -a | grep utf8 | while read l; do LC_ALL=$l locale abmon; done | tr ';' '\n' | sort -u | grep '.\{5,\}' | while read mon; do printf '%02d %s\n' "$(echo "$mon" | wc -L)" "$mon" done | sort -n | less That shows a couple of existing issues with the limit of 5. ln_CD.utf8 (Democratic Republic of the Congo) needs a length of 7 to be unambiguous, while Arabic needs 12! I don't remember arabic being so long at the time I implemented the alignment/truncation in ls (9 years ago), but we should probably expand to account for that. $ LC_ALL=ln_CD.utf8 locale abmon sánzá1.;sánzá2.;sánzá3.;sánzá4.;sánzá5.;sánzá6.;sánzá7.;sánzá8.;sánzá9.;sánz10.;sánzá11.;sánzá12. $ LC_ALL=ar_SY.utf8 locale abmon | tr ';' '\n' كانون الثاني شباط آذار نيسان نوار حزيران تموز آب أيلول تشرين الأول تشرين الثاني كانون الأول Given the increase in supported size should only impact relatively few languages it probably makes sense to increase to 12. The attached does that and also augments the test to find ambiguous cases. cheers, Pádraig
From d383dfd223c5d24ec22556d5707151f8c5ca18cf Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?P=C3=A1draig=20Brady?= <[email protected]> Date: Wed, 14 Mar 2018 11:31:43 -0700 Subject: [PATCH] ls: increase the allowed abmon width from 5 to 12 This will impact relatively few languages, and will make Arabic, Catalan, Lingala etc. output unambiguous abbreviated month names. * src/ls.c (MAX_MON_WIDTH): Increase from 5 to 12. * NEWS: Mention the bug fix. * tests/ls/abmon-align.sh: Augment to check for ambiguous output. Fixes https://bugs.gnu.org/30814 --- NEWS | 4 ++++ src/ls.c | 7 +++++-- tests/ls/abmon-align.sh | 9 ++++++--- 3 files changed, 15 insertions(+), 5 deletions(-) diff --git a/NEWS b/NEWS index e5569eb..351a082 100644 --- a/NEWS +++ b/NEWS @@ -21,6 +21,10 @@ GNU coreutils NEWS -*- outline -*- Previously it would have set executable bits on created special files. [bug introduced with coreutils-8.20] + ls no longer truncates the abbreviated month names that have a + display width between 6 and 12 inclusive. Previously this would have + output ambiguous months for Arabic or Catalan locales. + ** Improvements stat and tail now know about the "exfs" file system, which is a diff --git a/src/ls.c b/src/ls.c index cd6b09c..c89a22f 100644 --- a/src/ls.c +++ b/src/ls.c @@ -1095,8 +1095,11 @@ file_escape_init (void) variable width abbreviated months and also precomputing/caching the names was seen to increase the performance of ls significantly. */ -/* max number of display cells to use */ -enum { MAX_MON_WIDTH = 5 }; +/* max number of display cells to use. + As of 2018 the abmon for Arabic has entries with width 12. + It doesn't make much sense to support wider than this + and locales should aim for abmon entries of width <= 5. */ +enum { MAX_MON_WIDTH = 12 }; /* abformat[RECENT][MON] is the format to use for timestamps with recentness RECENT and month MON. */ enum { ABFORMAT_SIZE = 128 }; diff --git a/tests/ls/abmon-align.sh b/tests/ls/abmon-align.sh index b639ca9..d4ff708 100755 --- a/tests/ls/abmon-align.sh +++ b/tests/ls/abmon-align.sh @@ -32,17 +32,20 @@ for format in "%b" "[%b" "%b]" "[%b]"; do # The sed usage here is slightly different from the original, # removing the \(.*\), to avoid triggering misbehavior in at least # GNU sed 4.2 (possibly miscompiled) on Mac OS X (Darwin 9.8.0). - n_widths=$( + months="$( LC_ALL=$LOC TIME_STYLE=+"$format" ls -lgG *.ts | - LC_ALL=C sed 's/.\{15\}//;s/ ..\.ts$//;s/ /./g' | + LC_ALL=C sed 's/.\{15\}//;s/ ..\.ts$//;s/ /./g')" + n_widths=$(echo "$months" | while read mon; do echo "$mon" | LC_ALL=$LOC wc -L; done | uniq | wc -l ) + n_dupes=$(echo "$months" | sort | uniq -d | wc -l) test "$n_widths" = "1" || { fail=1; break 2; } + test "$n_dupes" = "0" || { fail=1; break 2; } done done if test "$fail" = "1"; then - echo "misalignment detected in $LOC locale:" + echo "misalignment or ambiguous output in $LOC locale:" LC_ALL=$LOC TIME_STYLE=+%b ls -lgG *.ts fi -- 2.9.3
