[
https://issues.apache.org/jira/browse/BEAM-6612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alex Amato updated BEAM-6612:
-----------------------------
Summary: PerformanceRegression in QueueingBeamFnDataClient (was: Remove
QueueingBeamFnDataClient)
> PerformanceRegression in QueueingBeamFnDataClient
> -------------------------------------------------
>
> Key: BEAM-6612
> URL: https://issues.apache.org/jira/browse/BEAM-6612
> Project: Beam
> Issue Type: New Feature
> Components: java-fn-execution
> Reporter: Alex Amato
> Assignee: Alex Amato
> Priority: Major
> Labels: triaged
> Time Spent: 50m
> Remaining Estimate: 0h
>
> Remove QueueingBeamFnDataClient, which made process() calls all run on the
> same thread.
> [~lcwik] and I came up with this design thinking that it was required to
> process the bundle in parallel anyways, and we would have good performance.
> However after speaking to Ken, there is no requirement for a bundle or key to
> be processed in parallel. Elements are either iterables or single elements
> which defines the needs for processing a group of elements on the same thread.
> Simply performing this change will lead to the following issues:
> (1) MetricsContainerImpl and MetricsContainer are not thread safe, so when
> the process() functions enter the metric container context, they will be
> accessing an thread-unsafe collection in parallel
> (2) An ExecutionStateTracker will be needed in every thread, So we will need
> to
> create an instance and activate it in every GrpC thread which receives a new
> element.
> (Will this get sampled properly, since the trackers will be short lived).
> (3) The SimpleExecutionStates being used will need to be thread safe as well?
> I don't think so, because I don't think that the ExecutionStateSampler
> invokes them in parallel.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)