[ 
https://issues.apache.org/jira/browse/BEAM-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17431412#comment-17431412
 ] 

Robert Burke commented on BEAM-13082:
-------------------------------------

Saving the buffer cuts tiny smaller write latency in half, with significant 
improvements with larger writes.

goos: linux
goarch: amd64
pkg: github.com/apache/beam/sdks/v2/go/pkg/beam/core/runtime/harness
BenchmarkDataWriter/4B-8                181523064                5.812 ns/op    
       0 B/op          0 allocs/op
BenchmarkDataWriter/16B-8               177836181                6.453 ns/op    
       0 B/op          0 allocs/op
BenchmarkDataWriter/1KB-8               27260578                38.32 ns/op     
       0 B/op          0 allocs/op
BenchmarkDataWriter/4KB-8                7680160               132.3 ns/op      
       3 B/op          0 allocs/op
BenchmarkDataWriter/100KB-8               315540              3854 ns/op        
      68 B/op          0 allocs/op
BenchmarkDataWriter/1MB-8                  20120             57577 ns/op        
     440 B/op          1 allocs/op
BenchmarkDataWriter/10MB-8                   986           1112952 ns/op        
   10939 B/op          5 allocs/op
BenchmarkDataWriter/100MB-8                   91          13236977 ns/op        
 1152590 B/op          5 allocs/op
BenchmarkDataWriter/256MB-8                   36          33558928 ns/op        
 7456857 B/op          5 allocs/op

The allocations are largely for the Flush calls, because we're allocating the 
proto each flush. However, caching a message instance has a negligible 
performance benefit, and decreases readability, so it's not going to be 
included here.

> [Go SDK] Reduce churn in dataWriter by retaining byte slice.
> ------------------------------------------------------------
>
>                 Key: BEAM-13082
>                 URL: https://issues.apache.org/jira/browse/BEAM-13082
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-go
>            Reporter: Robert Burke
>            Assignee: Robert Burke
>            Priority: P2
>
> It's been noted that we can reduce allocations and GC overhead produced by 
> the dataWriter if we change the `w.buf = nil` to `w.buf = w.buf[:0]`. We 
> should still nil out the buffer after the final flush in Close() however, to 
> avoid retaining larger byte buffers after bundle termination.
> A dataWriter is created per bundle, and is only used and is safe to use by 
> that bundle 's processing thread. Further, GRPC's Send call doesn't maintain 
> ownership of the Proto message data after Send returns, allowing this re-use.
> A later optimization could use a sync.Pool to maintain a "freelist" of 
> buffers to further reduce per bundle allocations but this would likely only 
> be noticeable in streaming contexts. Such a free list should have a cap of 
> keeping buffers under some threshold (say slices under 64MB in cap) to avoid 
> retaining overly large buffers that aren't in active use. This idea though is 
> out of scope for a first pass.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to