> Once you have some proper benchmarks, it might be fun to compare GoAWK's
>>> performance to that of my awk package <https://github.com/spakin/awk>.
>>>
>>
I'm not going to do thorough benchmarks at this point, but it looks like
GoAWK is significantly faster at present. Using the example in the
https://github.com/spakin/awk README, which is equivalent to this AWK
script:

    BEGIN { FS = OFS = "," }
    { $3 = $1+$2; print }

On a file with 1M lines of random numbers, with the example as is (no
stdout buffering) GoAWK takes about 1.1 seconds, and spakin/awk takes 36
seconds! However, most of this is due to the non-buffered writes to
os.Stdout. GoAWK automatically wraps os.Stdout in a bufio.Writer (though
I'd forgotten to do this at first as well). When I added the line (before
s.Run):

    s.Output = bufio.NewWriterSize(os.Stdout, 64*1024)

It speeds up spakin/awk by a factor of about 10x to 3.6 seconds. So GoAWK
is about 3x as fast for this simple (but not unrealistic) benchmark.

I generated the 1M line random file using this Python script (guess I
should have used AWK :-):

    import random, sys
    for _ in range(int(sys.argv[1])):
      n = random.randrange(1000000)
      m = random.randrange(1000000)
      print('%d,%d' % (n, m))

So my main suggestion (for spakin/awk) would be able to wrap os.Stdout in a
bufio.NewWriter (and be sure to call Flush before Run finishes). If the
user wants to pass an unbuffered version, they still can, but at least the
default is performant.

I also added CPU profiling to the spakin/awk script, and it looks like it's
doing a bunch more garbage collection than GoAWK, as well as some regexp
stuff. I suspect NewValue() is probably quite slow as it takes an
interface{} and does type checking. Also, strings are converted to numbers
using a regex, which is probably slower than a dedicated conversion/check
function (see parseFloatPrefix in goawk/interp/value.go).

See more optimization ideas in my post at
https://benhoyt.com/writings/goawk/

-Ben

On Thu, Nov 22, 2018 at 11:24 PM Tong Sun <suntong...@gmail.com> wrote:

>
>
> On Tuesday, August 28, 2018 at 9:06:22 AM UTC-4, Ben Hoyt wrote:
>>
>> Once you have some proper benchmarks, it might be fun to compare GoAWK's
>>> performance to that of my awk package <https://github.com/spakin/awk>.
>>>
>>
>> Nice -- will do!
>>
>
> Please post back when you've done that.
>
> I'm interested to know. Thx.
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "golang-nuts" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/golang-nuts/kYZp3Q1KKfE/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> golang-nuts+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to