Hey folks,

Just wanted to update the status of this.

During Gophercon, I happened to meet Russ Cox and asked him the same
question. If File::Read blocks goroutines, which then spawn new OS threads,
in a long running job, there should be plenty of OS threads created
already, so the random read throughput should increase over time and
stabilize to the maximum possible value. But, that's not what I see in my

And his explanation was that the GOMAXPROCS in a way acts like a
multiplexer. From docs, "the GOMAXPROCS variable limits the number of
operating system threads that can execute user-level Go code
simultaneously." Which basically means, all reads must first be run only
via GOMAXPROCS number of goroutines, before switching over to some OS
thread (not really a switch, but conceptually speaking). This introduces a
bottleneck for throughput.

I re-ran my benchmarks with a much higher GOMAXPROCS and was able to then
achieve the maximum throughput. The numbers are here:
To summarize these benchmarks, Linux fio achieves 118K IOPS, and with
GOMAXPROCS=64/128, I'm able to achieve 105K IOPS, which is close enough.

Regarding the point about using io_submit etc., instead of goroutines; I
managed to find a library which does that, but it performed worse than just
using goroutines.
>From what I gather (talking to Russ and Ian), whatever work is going on
in user space, the same work has to happen in kernel space; so there's not
much benefit here.

Overall, with GOMAXPROCS set to a higher value (as I've done in Dgraph
one can get the advertised SSD throughput using goroutines.

Thanks, Ian, Russ and the Go community in helping solve this problem!

On Sat, May 20, 2017 at 5:31 AM, Ian Lance Taylor <i...@golang.org> wrote:

> On Fri, May 19, 2017 at 3:26 AM, Manish Rai Jain <manishrj...@gmail.com>
> wrote:
> >
> >> It's not obvious to me that io_submit would be a win for normal
> > programs, but if anybody wants to try it out and see that would be
> > great.
> >
> > Yeah, my hunch is that the cost of threads context switching is going to
> be
> > a hindrance to achieving the true throughput of SSDs. So, I'd like to
> try it
> > out. A few guiding pointers would be useful:
> >
> > - This can be done directly via Syscall and Syscall6, is that right? Or
> > should I use Cgo?
> You should be able to use syscall.Syscall.
> > - I see SYS_IO_SUBMIT in syscall package. But, no aio_context_t, or
> iocbpp
> > structs in the package.
> > - Similarly, other structs for io_getevents etc.
> > - What's the best way to generate them, so syscall.Syscall would accept
> > these?
> The simplest way is to get them via cgo.  The better way is to add
> them to the x/sys/unix package as described at
> https://github.com/golang/sys/blob/master/unix/README.md .
> Ian

You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to