The problem with this kind of awk program is that everything will be loaded
to memory. But bare `sort` use external files to save memory. When the hash
in awk is too large, accessing it can become very slow (maybe due to
potential cache miss or slow down of hash as a function of hash size).

On Sun, Jun 30, 2019 at 11:52 AM Assaf Gordon <assafgor...@gmail.com> wrote:

> Correcting myself:
>
> On Sun, Jun 30, 2019 at 10:08:46AM -0600, Assaf Gordon wrote:
> > On Sun, Jun 30, 2019 at 07:34:19AM -0500, Peng Yu wrote:
> > >
> > > I have a long list of string (each string is in a line). I need to
> > > count the number of appearance for each string.
> > >
> > > [...] Does anybody know any better way
> > > to make the sort and count run more efficiently?
> > >
> >
> > Or using gnu awk:
>
> use 'asorti' instead of 'asort', with the two-parameter variant:
>
>
>   $ printf "%s\n" a c b b b b b b c \
>         | awk 'a[$1]++ {}
>                END { n = asorti(a,b)
>                      for (i = 1; i <= n; i++) {
>                         print b[i], a[b[i]]
>                      }
>                    }'
>   a 1
>   b 6
>   c 2
>
>
> For more details see:
>
> https://www.gnu.org/software/gawk/manual/html_node/Array-Sorting-Functions.html#Array-Sorting-Functions
>
> -assaf
>
> --
Regards,
Peng

Reply via email to