[
https://issues.apache.org/jira/browse/BEAM-3572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16345912#comment-16345912
]
Kenneth Knowles commented on BEAM-3572:
---------------------------------------
I think I see what you mean in terms of excess allocation. The buffering was
added as an optimization :-)
While the {{Coder}} itself should be observably immutable, there is no problem
with mutation under the hood to manage a pool of buffers. The real issue, which
you alluded to, is that coders are required to be thread safe. The reason that
{{BufferedElementCountingOutputStream}} can be used despite lack of thread
safety is that it is only local.
Having either {{IterableLikeCoder}} or {{BufferedElementCountingOutputStream}}
do their own suballocation makes sense, with the usual caveats of bugs and
leaks from that sort of code. Definitely better encapsulation for
{{BufferedElementCountingOutputStream}} to own it unless it doesn't have enough
info to do it well. I'm willing to trust that you came to this because you
actually hit this in practice, or are at least driven by a benchmark.
> Reduce inefficient allocations in coders
> ----------------------------------------
>
> Key: BEAM-3572
> URL: https://issues.apache.org/jira/browse/BEAM-3572
> Project: Beam
> Issue Type: Improvement
> Components: sdk-java-core
> Reporter: Bill Neubauer
> Assignee: Bill Neubauer
> Priority: Minor
> Original Estimate: 24h
> Remaining Estimate: 24h
>
> BufferedElementCountingOutputStream's constructor allocates a new buffer to
> wrap the input OutputStream. This gets called on each invocation of encode()
> from IterableLikeCoder. Since Coder is designed to be stateless, but thisĀ
> buffer holds state and isn't threadsafe, we can't just have the caller manage
> the buffer. Modifying the constructor to use a pool of buffers to reduce the
> number of allocations will help performance.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)