[ 
https://issues.apache.org/jira/browse/TEZ-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14193495#comment-14193495
 ] 

Siddharth Seth commented on TEZ-1547:
-------------------------------------

bq. Both are under the writeLock. So they cannot interleave. If done() happens 
first then it will fail the startTime check and no event will be sent. Then 
startVertex will send the event because byManager==false. If start happens 
first then event will not be sent since byManager==true. When done() is called 
then it will see startTime>0 and send the event once. In anycase, I can use an 
AtomicBoolean to make things really clear.
Here's the scenario where it can happen.
1. User sets intentToConfigure (vertexToBeReconfiguredByManager = true)
2. vertexManager.onVertexStarted(pendingReportedSrcCompletions) invoked.
3. The user code invokes doneReconfiguringVertex as part of this. Event goes 
out, vertexToBeReconfiguredByManager set to false (The write lock is taken by 
then same thread; otherwise this would be a deadlock)
4. onVertexStarted returns
5. maybeSendConfiguredEvent invoked - since vertexToBeReconfiguredByManager is 
false, the event goes out again.
The latest patch contains an explicit check on whether the event has been sent 
before, which will take care of this.

bq. In terms of the incompatible change mentioned in the previous comment 
[~rohini], [~hagleitn] - need some info on Pig/Hive VertexManager usage. When 
using a Pig/Hive custom vertexManager, will the parallelism always be set to -1 
at DAG construction time, when these vertexManagers invoke setParallelism ?

bq. The difference between setParallelism and startVertex is 2 events on the 
dispatcher queue that are sent back to back. setParallelism->immediately sends 
READY_TO_INIT-> immediately triggers START-> sends CONFIGURED_COMPLETE. So time 
difference would be negligibly small. Sending it in startVertex makes it future 
proof to any code changes prior to start.
I don't think time difference is the consideration here. A CONFIGURED event 
should go out as early as possible. Even if sending it in the start transition 
- why is it being sent after invoking onVertexStarted ? If the event is going 
out after onVertexStarted - it should be called STARTED.

bq. I had initially thought of reconfigured but its not necessary that the 
vertex was reconfigured in many case and so the name would be misleading.
Please see my last comment about STARTED. The reason that RECONFIGURED may not 
be valid is that we're sending an event which indicates that the vertex has 
started, but calling it FULLY_CONFIGURED.
How about this, CONFIGURED - goes out when parallelism / edge configs are 
changed (setParallelism), and the current event gets renamed to STARTED.
FULLY_CONFIGURED is a very restrictive name IMO for future changes.

On the ImmediateStartVertexManager - we can iterate in TEZ-1522, which should 
make it possible for the VertexManagers to be oblivious of what's happening in 
other Vertices, except when absolutely required (ShuffleVertexManager providing 
data updates).

I'd be a little more comfortable using TEZ-1522 (and plugins) to fix the 
scheduling, and late start issues that we're seeing - for 0.5.2. Your call on 
this being safe for 0.5.2 though.


> Make use of state change notifier in VertexManagerPlugins
> ---------------------------------------------------------
>
>                 Key: TEZ-1547
>                 URL: https://issues.apache.org/jira/browse/TEZ-1547
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Siddharth Seth
>            Assignee: Siddharth Seth
>         Attachments: TEZ-1547.1.patch, TEZ-1547.3.patch, TEZ-1547.4.patch, 
> TEZ-1547.5.patch, TEZ-1547.6.patch, TEZ-1547.7.patch, TEZ-1547.8.patch, 
> TEZ-1547.9.patch
>
>
> Instead of the various APIs like onVertexStarted, simple notifications could 
> be sent.
> Some existing APIs could end up being deprecated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to