[
https://issues.apache.org/jira/browse/TEZ-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077484#comment-14077484
]
Siddharth Seth commented on TEZ-707:
------------------------------------
Some comments on the latest patch. It's much closer now, but has several issues
- some will require more complicated cases to hit, some harmless, and others
just related to code style
- Exceptions thrown by running tasks aren't really being handled, and the
relevant messages are not being transmitted to the rest of the system. The
exceptions, with the ListeningExecutor will be available in the callback. Just
a try {executor.submit()} catch {Throwable } won't actually catch any of the
runtime exceptions. Exceptions during submission are handled by
sendContainerLaunchFailedMsg though.
- The task launch and threadpool loop can end up in situations where
stopEvents, launchEvents etc end up getting blocked on an existing task which
is running. Ideally, this should all be done via notifications.
- serviceStart/serviceInit(conf), serviceStop override methods from
AbstractService and need annotations. Fetching configuration from context isn't
required - since that is available via serviceStart
- There's a bunch of fields which are still static, and don't need to be
- Token handling - there's some code to read tokens from the
ContainerLaunchContext. I don't think this is required. Under regular flow,
tokens required to run a task are sent over the wire - and TezChild takes care
of setting up the proper UGI with the relevant tokens.
- local-directory handling needs to be in sync with what is done in TEZ-717. At
the moment, I'm not sure what the local-directories will be set to.
[~airbots] - if you don't mind, I'd like to take the patch in it's current
form, and take it to completion. There's some fairly complicated interactions
with Tez, as well as event handling on start/stop.
> Create LocalContainerLauncher
> -----------------------------
>
> Key: TEZ-707
> URL: https://issues.apache.org/jira/browse/TEZ-707
> Project: Apache Tez
> Issue Type: Sub-task
> Affects Versions: 0.3.0
> Reporter: Chen He
> Assignee: Chen He
> Priority: Blocker
> Attachments: TEZ-707-2014-7015.patch, TEZ-707-2014-7015.patch.review,
> TEZ-707-v3.patch, TEZ-707-v4.patch, TEZ-707.patch, TEZ-707.patch,
> TEZ-707.patch, Tez-707.patch.v2, tez-707.patch
>
>
> Create LocalContainerLauncher and make it work for a single stage DAG. The
> TaskSchedulerEventHandler still asks RM for new container but
> LocalContainerLauncher will run TezTask in form of thread instead of using
> this container from yarn.
--
This message was sent by Atlassian JIRA
(v6.2#6252)