[go-nuts] Concurrent solution: Which is the most efficient way to read STDIN lines 100s of MB long?

Const V Mon, 13 Jun 2022 11:22:08 -0700


This is related of previous discussion but this time about finding 
concurrent solution.


I'm posting my solution. Any feedback and suggestions for improvement are 
welcome!

I need to write a program that reads STDIN and should output every line 
that contains a search word "test" to STDOUT. 

The input is a 100MB string.

Utilizing 8 cores is giving 2 times faster solution with concurrent version.

Each coroutine is working on 12.5MB. The worst case scenario is when the 
string error is at the end of the input.

If one goroutine finds it I can stop the other 7 from working. This will be 
good if the searched string is in the middle.

The only way to interrupt bytes.Contains is by getting its source code and 
look how to stop the search.

------------------------------------------------

Machine Mac-mini M1 16 MB:

------------------------------------------------

7.47s  BigGrepBytes

3.81s  BigGrepBytes1_Concurrent

------------------------------------------------


Here are the benchmarks:

---

Type: cpu

Time: Jun 13, 2022 at 10:34am (PDT)

Duration: 124.30s, Total samples = 207.43s (166.88%)

Active filters:

   focus=Benchmark

   hide=benchm

   show=Benchmark

      flat  flat%   sum%        cum   cum%

     7.47s  3.60% 21.86%      7.47s  3.60%  
Benchmark_100MB_End__20charsBigGrepBytes_

     3.81s  1.84% 34.38%      3.81s  1.84%  
Benchmark_100MB_End__20charsBigGrepBytes1_Concurrent

---

Not Concurrent version:

---

func BigGrepBytes(r io.Reader, w io.Writer, find []byte) { // <1>

    var b bytes.Buffer                   // <2>

    _, _ = b.ReadFrom(r)                 // <3>

    if bytes.Contains(b.Bytes(), find) { // <4>

        w.Write(b.Bytes())

    } else {

        w.Write([]byte(" \n")) // <5>

    }

}

Concurrent version:

--

func BigGrepBytesCh1(r io.Reader, w io.Writer, find []byte, cores int) {

    ch := make(chan bool, cores) // <1>

    overlap := len(find)

    var b bytes.Buffer   // <2>

    _, _ = b.ReadFrom(r) // <3>

    // <4>

    filelen := len(b.Bytes())

    chunksize := filelen / cores

   for i := 0; i < cores; i++ {

        start := i * chunksize

        end := min(start+chunksize+overlap, filelen)

        go BytesContainsCh1(b.Bytes(), start, end, find, ch)

    }

    found := false

    for i := 0; i < cores; i++ {

        if <-ch {

           found = true

           w.Write(b.Bytes()) // <7>

           break

        } else {

       }

    }

    if !found { // <8>

        w.Write([]byte(" \n"))

    }

}

func BytesContainsCh1(b []byte, start int, end int, find []byte, ch chan 
bool) { // <1>

    ch <- bytes.Contains(b[start:end], find)

}

--

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/4301c5e2-fca9-4cf7-8e46-969dcfc7451fn%40googlegroups.com.

[go-nuts] Concurrent solution: Which is the most efficient way to read STDIN lines 100s of MB long?

Reply via email to