[
https://issues.apache.org/jira/browse/IMPALA-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Armstrong updated IMPALA-4224:
----------------------------------
Description:
Now that IMPALA-3567 is solved, the next step is to add the plumbing to have a
join builder as the sink of a plan fragment to implement the parallel plans
added in http://gerrit.cloudera.org:8080/2846
This JIRA tracks making the plans executable, without sharing of the join build
for broadcast join.
Steps required:
* Enable the join build sink in the planner
* Update planner to include all required state in the thrift objects (the join
build sinks are missing various required info).
* Update planner resource requirement calculations - join build fragment needs
real resource estimates
* Update scheduler to schedule join build fragment co-located with their parent
fragment. This depends on the build plans being sent pre-order. Pass the source
fragment instance id into the join nodes so they can locate the input fragment
instance.
* Update scheduler to correctly handle multiple build plans.
* Instantiate the join builders as input sinks to the plan. This requires
getting some data from the thrift structs instead of passed in from the PHJNode
* Ensure the join builders function correctly as plan sinks (e.g. add an
indefinite wait to the join node to prevent it from crashing, ensure that the
builder consumes the whole input). Initially we probably wait to have the build
thread block in Close().
* Update the join node so that in the non-subplan mt_dop > 0 case, it looks up
the input fragment instance and waits for it to finish the build (with
cancellation). Need to find all the places it looks for the right child.
* After that the join node "owns" the builder so the control flow should be
the same mostly. The main difference is that the buffer pool client and memory
tracking is set up differently. Maybe need to change the Close() call as well?
* Figure out any resource management, etc, issues across the build and probe
(threads, memory, etc). Fix up the builder thread behaviour so that Close()
doesn't block and the thread is released.
This, I think, needs to be one change because the intermediate states aren't
testable or functional.
Testing:
* Existing mt join tests are useful and will exercise the new behaviour
* Ensure spilling is tested with multithreading (new dimension to spilling
tests?)
* Ensure cancellation is tested.
was:
Now that IMPALA-3567 is solved, the next step is to add the plumbing to have a
join builder as the sink of a plan fragment to implement the parallel plans
added in http://gerrit.cloudera.org:8080/2846
This JIRA tracks making the plans executable, without sharing of the join build
for broadcast join
> Add backend support for join build sinks in parallel plans
> ----------------------------------------------------------
>
> Key: IMPALA-4224
> URL: https://issues.apache.org/jira/browse/IMPALA-4224
> Project: IMPALA
> Issue Type: Improvement
> Components: Backend
> Affects Versions: Impala 2.8.0
> Reporter: Tim Armstrong
> Assignee: Tim Armstrong
> Priority: Major
> Labels: multithreading
>
> Now that IMPALA-3567 is solved, the next step is to add the plumbing to have
> a join builder as the sink of a plan fragment to implement the parallel plans
> added in http://gerrit.cloudera.org:8080/2846
> This JIRA tracks making the plans executable, without sharing of the join
> build for broadcast join.
> Steps required:
> * Enable the join build sink in the planner
> * Update planner to include all required state in the thrift objects (the
> join build sinks are missing various required info).
> * Update planner resource requirement calculations - join build fragment
> needs real resource estimates
> * Update scheduler to schedule join build fragment co-located with their
> parent fragment. This depends on the build plans being sent pre-order. Pass
> the source fragment instance id into the join nodes so they can locate the
> input fragment instance.
> * Update scheduler to correctly handle multiple build plans.
> * Instantiate the join builders as input sinks to the plan. This requires
> getting some data from the thrift structs instead of passed in from the
> PHJNode
> * Ensure the join builders function correctly as plan sinks (e.g. add an
> indefinite wait to the join node to prevent it from crashing, ensure that the
> builder consumes the whole input). Initially we probably wait to have the
> build thread block in Close().
> * Update the join node so that in the non-subplan mt_dop > 0 case, it looks
> up the input fragment instance and waits for it to finish the build (with
> cancellation). Need to find all the places it looks for the right child.
> * After that the join node "owns" the builder so the control flow should be
> the same mostly. The main difference is that the buffer pool client and
> memory tracking is set up differently. Maybe need to change the Close() call
> as well?
> * Figure out any resource management, etc, issues across the build and probe
> (threads, memory, etc). Fix up the builder thread behaviour so that Close()
> doesn't block and the thread is released.
> This, I think, needs to be one change because the intermediate states aren't
> testable or functional.
> Testing:
> * Existing mt join tests are useful and will exercise the new behaviour
> * Ensure spilling is tested with multithreading (new dimension to spilling
> tests?)
> * Ensure cancellation is tested.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]