bug#65416: Feature request: include first line of file in output

arnold Tue, 22 Aug 2023 19:34:32 -0700

I can't speak for the grep guys, but at least I was correct that
current gawk is much faster than gawk 4.0.2.


Arnold

Daniel Green <[email protected]> wrote:

> I don't have access to a newer gawk where I did the initial timings, but I
> ran an almost identical test on my home machine.
>
>     grep (v3.11):                                              ~0.60s
>     perl (v5.38.0):                                            ~3.21s
>     gawk (v4.0.2 built from source with `-O3 -march=native`): ~10.22s
>     gawk (v5.2.2 built from source with `-O3 -march=native`):  ~4.95s
>
> If grep will never add this functionality I'll survive, it just seemed like
> it might not be too much work to implement, and would probably still be
> much faster than using awk/perl. I've never looked at the grep source code
> before, but could be tempted to try implementing it myself if there was any
> chance of the path being accepted.
>
> Dan
>
> On Mon, Aug 21, 2023 at 2:37 PM <[email protected]> wrote:
>
> > Gawk 4.0.2 is 11 years old. Try timing the current version,
> > I'll bet it's faster.  And it solves your problem NOW,
> > instead of waiting for a feature that the grep developers
> > aren't likely to add.
> >
> > My two cents of course.
> >
> > Arnold
> >
> > Daniel Green <[email protected]> wrote:
> >
> > > That works, as well as the Perl version I've been using:
> > >
> > >     perl -ne 'print if ($. == 1 || /pattern/)'
> > >
> > > But timings for a real-life example (3GB file with ~16m lines, CentOS 7)
> > > show the problem:
> > >
> > >     grep (v2.20):    ~1.15s
> > >     perl (v5.36.1):  ~4.48s
> > >      awk (v4.0.2):  ~10.81s
> > >
> > > Admittedly grep is just searching in those timings, but I suspect it
> > could
> > > accomplish the full task with a minimal decrease in speed.
> > >
> > > Dan
> > >
> > > On Mon, Aug 21, 2023 at 12:57 PM <[email protected]> wrote:
> > >
> > > > Daniel Green <[email protected]> wrote:
> > > >
> > > > > I'm frequently searching CSV files with 20-30 columns, and when
> > there's a
> > > > > hit it can be hard to know what the columns are. An option to also
> > print
> > > > > the first line of a file (either always, or only if that file had a
> > match
> > > > > to the pattern) in addition to any hits would be nice.
> > > > >
> > > > > Thanks,
> > > > > Dan
> > > >
> > > > It sounds like awk would be a better tool:
> > > >
> > > >         awk 'FNR == 1 || /pattern/' files ...
> > > >
> > > > should do the trick.
> > > >
> > > > HTH,
> > > >
> > > > Arnold
> > > >
> >

bug#65416: Feature request: include first line of file in output

Reply via email to