Hi Chet,
On Tue, Apr 14, 2026 at 10:58:49AM -0400, Chet Ramey wrote:
> On 4/13/26 7:19 PM, Duncan Roe wrote:
> > Hi Chet,
> >
> > The rev utility is often used in conjunction with the cut utility which is a
> > loadable builtin, so here is a loadable builtin rev.
>
> Thanks. Can you reply to this with permission to put an FSF copyright
> header on it?
Certainly you may put an FSF header on it.
>
> > This rev is re-engineered to not use wide character functions. Instead it
> > processes utf8 multibyte characters at the byte level to preserve these
> > characters in the reversed line.
>
> So UTF-8 is the only multibyte encoding it handles?
Yes, at least for now. It didn't look to me as though zgetline could cope with
utf16 anyway: a number of utf16 characters have newline as one of their bytes
for instance.
>
> > Array handling is modelled on the cut builtin.
> >
> > This patch is against devel HEAD.
> >
> > Performance:
> >
> > Tests were done with 2GiB files: 2 were 100% single-byte characters, the
> > others
> > being 100% multibyte (except newlines). 2 files had 1024B lines, the others
> > being 64B. This gave a total of 4 files. The benchmark was rev from
> > util-linux-2.41.3.
> >
> > The stand-alone prototype was faster in every case. Part of converting to
> > loadable was to replace read(2) calls with zgetline(bash). This degraded
> > performance somewhat. Here are the numbers ('x' means "times as fast"):
> > 64B multi: 0.4x; 1024B multi: 0.6x; 64B single: 1.1x; 1024B single: 1.6x
> >
> > One expects the builtin will always out-perform the external utility with
> > short
> > files.
>
> Assuming that the UTF-8 handling is identical between the two versions, the
> difference is more than likely due to data copying. I imagine that the
> standalone version uses a single buffer and manipulates data in it, since
> it only has to output lines. The BSD version works that way. The zgetline
> version has to allocate memory for each line, copy the data into it, then
> reverse the characters in that buffer.
>
> Chet
I included the performance figures for reassurance mainly.
I think I have identified at least some of the reasons for the observed
variations: I won't go into them here but would gladly do so in another post if
you or anyone else would like. BTW rev calls zgetline in a way that re-uses the
line buffer.
Cheers ... Duncan.