On 04/06/2014 07:24 PM, Bob Proulx wrote: > Pádraig Brady wrote: >> Yes printf follows the C standard which only considers bytes. >> ... >> I don't think we'd be able to change the current operation of printf >> due to backwards compat reasons? Though we might be able to somehow leverage >> the existing multibyte character aware alignment/truncation code in: >> http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=blob;f=gl/lib/mbsalign.c;hb=HEAD > > Dan Douglas pointed out in the corresponding discussion in bug-bash > that ksh uses the L modifier. > > http://lists.gnu.org/archive/html/bug-bash/2014-04/msg00021.html > > Dan Douglas wrote: > > ksh93 already has this feature using the "L" modifier: > > > > ksh -c "printf '%.3Ls\n' $'\u2605\u2605\u2605\u2605\u2605'" > > ★★★ > > At least there is prior art for it.
So we can count bytes, chars or cells (graphemes). Thinking a bit more about it, I think shell level printf should be dealing in text of the current encoding and counting cells. In the edge case where you want to deal in bytes one can do: LC_ALL=C printf ... I see that ksh behaves as I would expect and counts cells, though requires the explicit %L enabler: $ ksh -c "printf '%.3Ls\n' $'a\u0301\u2605\u2605\u2605'" á★★ $ ksh -c "printf '%.3Ls\n' $'A\u2605\u2605\u2605'" A★ $ ksh -c "printf '%.3Ls\n' $'AA\u2605\u2605\u2605'" A zsh seems to just count characters: $ zsh -c "printf '%.3Ls\n' $'a\u0301\u2605\u2605\u2605'" á★ $ zsh -c "printf '%.3s\n' $'a\u0301\u2605\u2605\u2605'" á★ $ zsh -c "printf '%.3Ls\n' $'A\u2605\u2605\u2605'" A★★ I see that dash gives invalid directive for any of %ls %Ls %S. Pity there is no consensus here. Personally I would go for: printf '%3s' 'blah' # count cells printf '%3Ls' 'blah' # count chars LANG=C '%3Ls' 'blah' # count bytes LANG=C '%3s' 'blah' # count bytes Pádraig.
