[ 
https://issues.apache.org/jira/browse/TEZ-1170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14029992#comment-14029992
 ] 

Siddharth Seth commented on TEZ-1170:
-------------------------------------

Ahh, didn't see the Monday note in the comment. Sorry about that.

Comments on the patch.
- It's simplified some usage, but there's places where the transitions out of 
INITIALIZING and INITED are inconsistent. E.g. V_ONE_TO_ONE_SOURCE_SPLIT keeps 
the vertex in it's current state and generates a READY_TO_INIT event. OTOH, 
NULL_EDGE_SET / ROOT_INPUT_INITIALIZED move the vertex into the next state 
directly. Was expecting the behaviour to be similar for all such events which 
can move the Vertex to another state (either all via an extra event, or all via 
direct transitions). Direct transitions is likely better since it doesn't 
unnecessarily queue up behind other events. Unrelated, but the same is true of 
moving into the STARTED state - we end up generating an additional event back 
to the same Vertex, which also goes to the back of the queue (will create a 
JIRA for the START event).

- In the same context, V_PARALLELISM_INITIALIZED was a better name, since the 
renamed event (V_READY_TO_INIT) is only generated when parallelism is updated. 
Most of the state machines are following a model where the event indicates what 
happened (PARALLELISM_INITIALIZED) vs what needs to be done (V_READY_TO_INIT).

- {code} .addTransition(VertexState.NEW, 
                EnumSet.of(VertexState.NEW),
                  VertexEventType.V_NULL_EDGE_INITIALIZED,
                  new NullEdgeInitializedTransition()){code}
This is a little confusing - since NullEdgeInitializedTransition removes from 
the emptyEdge set, which may not have been populated yet. Works though, since 
the edge specified won't make it into the map when the map is finally populated.

- {code}     .addTransition(VertexState.TERMINATING,
              EnumSet.of(VertexState.TERMINATING),
              VertexEventType.V_ROOT_INPUT_INITIALIZED,
              new RootInputInitializedTransition()){code}
Didn't quite understand this change. Why should the 
RootInputInitializerTransition be invoked if the Vertex is already in 
TERMINATING state (or FAILED state etc). If this is to shut down the threads 
which are initializing, a shutdown is already being invoked when in the 
INITIALIZING state.

- Unrelated to this patch, is ONE_TO_ONE_SOURCE_SPLIT expected in RUNNING state 
- possibly from a different edge ?, or should that be an error. RUNNING itself 
seems to be an error condition for this event.

It's not absolutely necessary for RootInputs to complete, to move a Vertex out 
of INITIALIZING state. Once parallelism is set (and associated 
InputSpecUpdates, LocationHints etc), and Null Edges have been setup - it 
should be possible to just start a vertex. Assuming this was left in place so 
that tasks don't start, while they may not have events ready for the Inputs, or 
may not have generated data for the inputs?

> Simplify Vertex Initializing transition
> ---------------------------------------
>
>                 Key: TEZ-1170
>                 URL: https://issues.apache.org/jira/browse/TEZ-1170
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Siddharth Seth
>            Assignee: Bikas Saha
>             Fix For: 0.5.0
>
>         Attachments: TEZ-1170.1.patch, TEZ-1170.2.patch
>
>
> After TEZ-1145 and 1151, a vertex should only need to stay in INITIALZING 
> state when it has an uninitialized edge, or when the parallelism is at -1 
> (not set yet). Waiting for all RootInputInitializers to complete should not 
> be required - as long as one of them sets parallelism (via a VertexManager).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to