[
https://issues.apache.org/jira/browse/TEZ-3369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15419444#comment-15419444
]
ASF GitHub Bot commented on TEZ-3369:
-------------------------------------
GitHub user piyushnarang opened a pull request:
https://github.com/apache/tez/pull/13
[TEZ-3369][WIP] Fixing Tez's DAGClients to work with Cascading
Adding a few APIs to Tez to help Cascading break its dependency on an
internal implementation and still get some of the data it needs.
Added APIs:
```
DAGInformation getDAGInformation()
TaskInformation getTaskInformation(String vertexID, String taskID)
List<TaskInformation> getTaskInformation(String vertexID, @Nullable String
startTaskID, int limit)
```
Putting this PR out as a WIP to get some feedback on the APIs and the data
they return.
I'm still working through cleaning up the code, fixing some todos and
testing more.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/piyushnarang/tez daginfo-prototype
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/tez/pull/13.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #13
----
commit 74b78a563a1a7b13a7e120cc05ef53fa3621aaa6
Author: Piyush Narang <[email protected]>
Date: 2016-08-08T20:11:40Z
First cut proto API changes
commit 8cbdb9ea359915f2a4f59cee92a82cd6a0393f13
Author: Piyush Narang <[email protected]>
Date: 2016-08-08T20:38:43Z
Proto pojo object and builder implementations
commit 4b67a1be909a2e49ce1da38e1f976ceaddf135ee
Author: Piyush Narang <[email protected]>
Date: 2016-08-09T22:01:50Z
Minor proto tweaks
commit 7aa1fa0e6b5ed73a569070aaae6c960b2cbe6c44
Author: Piyush Narang <[email protected]>
Date: 2016-08-09T22:51:42Z
Add rpc methods to get dagInfo and taskInfo
commit b1ea42d4767c58b4b89d4c6ae5dbaf6da0225753
Author: Piyush Narang <[email protected]>
Date: 2016-08-12T20:31:53Z
Add client impl + tests
----
> Fixing Tez's DAGClients to work with Cascading
> ----------------------------------------------
>
> Key: TEZ-3369
> URL: https://issues.apache.org/jira/browse/TEZ-3369
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Piyush Narang
>
> Hi,
> We seem to be running into issues when we try to use the newest version of
> Tez (0.9.0-SNAPSHOT) with Cascading. The issue seems to be:
> {code}
> java.lang.ClassCastException: cascading.stats.tez.util.TezTimelineClient
> cannot be cast to org.apache.tez.dag.api.client.DAGClient
> at
> cascading.stats.tez.util.TezStatsUtil.createTimelineClient(TezStatsUtil.java:142)
> {code}
> (Full stack trace at the end)
> Relevant Cascading code is:
> 1) [Cascading tries to create a TezTimelineClient and cast it to a DAGClient
> |
> https://github.com/Cascading/cascading/blob/3.1/cascading-hadoop2-tez-stats/src/main/java/cascading/stats/tez/util/TezStatsUtil.java#L142]
> 2) [TezTimelineClient extends from DAGClientTimelineImpl |
> https://github.com/Cascading/cascading/blob/3.1/cascading-hadoop2-tez-stats/src/main/java/cascading/stats/tez/util/TezTimelineClient.java#L53]
> 3) [DAGClientTimelineImpl extends from DAGClientInternal |
> https://github.com/apache/tez/blob/dacd0191b684208d71ea457ca849f2d01212bb7e/tez-api/src/main/java/org/apache/tez/dag/api/client/DAGClientTimelineImpl.java#L68]
> 4) [DAGClientInternal extends Closeable which is why things break |
> https://github.com/apache/tez/blob/dacd0191b684208d71ea457ca849f2d01212bb7e/tez-api/src/main/java/org/apache/tez/dag/api/client/DAGClientInternal.java#L38].
> This behavior was 'broken' in this [commit |
> https://github.com/apache/tez/commit/2af886b509015200e1c04527275474cbc771c667]
> (release 0.8.3)
> The TezTimelineClient in Cascading seems to do two things:
> 1) DAGClient functionalities - ends up delegating to the inner DAGClient
> object.
> 2) Retrieve stuff like vertexID, vertexChildren and vertexChild (from this
> [interface|https://github.com/Cascading/cascading/blob/3.1/cascading-hadoop2-tez-stats/src/main/java/cascading/stats/tez/util/TimelineClient.java#L31]).
>
> As there's no good way to get the vertexID / vertexChildren / vertexChild
> (correct me if I'm wrong), they end up extending from the
> DAGClientTimelineImpl which has the http client and json parsing code to
> allow [things like this |
> https://github.com/Cascading/cascading/blob/3.1/cascading-hadoop2-tez-stats/src/main/java/cascading/stats/tez/util/TezTimelineClient.java#L93]:
> {code}
> @Override
> public String getVertexID( String vertexName ) throws IOException,
> TezException
> {
> // the filter 'vertexName' is in the 'otherinfo' field, so it must be
> requested, otherwise timeline server throws
> // an NPE. to be safe, we include both fields in the result
> String format =
> "%s/%s?primaryFilter=%s:%s&secondaryFilter=vertexName:%s&fields=%s";
> String url = String.format( format, baseUri, TEZ_VERTEX_ID, TEZ_DAG_ID,
> dagId, vertexName, FILTER_BY_FIELDS );
> JSONObject jsonRoot = getJsonRootEntity( url );
> JSONArray entitiesNode = jsonRoot.optJSONArray( ENTITIES );
> ...
> {code}
> Some options I can think of:
> 1) Ideally these methods getVertexID / getVertexChildren / getVertexChild
> would be part of DAGClient? Or even part of the DAGClientTimelineImpl? That
> way the cascading code wouldn't need updating if the uri changed / json
> format changed, it would end up being updated in these clients as well. I
> suspect adding this to DAGClient would require more work as it'll also need
> to be supported by the RPCClient and I don't think there are the relevant
> protos and such available.
> 2) A simpler fix would be to have DAGClientInternal extend DAGClient
> (currently it just implements Closeable). This will not require any changes
> on the Cascading side as DAGClientTimelineImpl will continue to be a
> DAGClient.
> Full stack trace:
> {code}
> Exception in thread "flow
> com.twitter.data_platform.e2e_testing.jobs.parquet.E2ETestConvertThriftToParquet"
> java.lang.ClassCastException: cascading.stats.tez.util.TezTimelineClient
> cannot be cast to org.apache.tez.dag.api.client.DAGClient
> at
> cascading.stats.tez.util.TezStatsUtil.createTimelineClient(TezStatsUtil.java:142)
> at
> cascading.flow.tez.planner.Hadoop2TezFlowStepJob$1.getJobStatusClient(Hadoop2TezFlowStepJob.java:117)
> at
> cascading.flow.tez.planner.Hadoop2TezFlowStepJob$1.getJobStatusClient(Hadoop2TezFlowStepJob.java:105)
> at
> cascading.stats.tez.TezStepStats$1.getJobStatusClient(TezStepStats.java:60)
> at
> cascading.stats.tez.TezStepStats$1.getJobStatusClient(TezStepStats.java:56)
> at cascading.stats.CounterCache.cachedCounters(CounterCache.java:229)
> at cascading.stats.CounterCache.cachedCounters(CounterCache.java:187)
> at cascading.stats.CounterCache.getCounterValue(CounterCache.java:167)
> at
> cascading.stats.BaseCachedStepStats.getCounterValue(BaseCachedStepStats.java:105)
> at cascading.stats.FlowStats.getCounterValue(FlowStats.java:170)
> at
> cascading.flow.tez.Hadoop2TezFlow.getTotalSliceCPUMilliSeconds(Hadoop2TezFlow.java:303)
> at cascading.flow.BaseFlow.run(BaseFlow.java:1287)
> at cascading.flow.BaseFlow.access$100(BaseFlow.java:82)
> at cascading.flow.BaseFlow$1.run(BaseFlow.java:928)
> at java.lang.Thread.run(Thread.java:745)
> Exception in thread "main" java.lang.Throwable: If you know what exactly
> caused this error, please consider contributing to GitHub via following link.
> https://github.com/twitter/scalding/wiki/Common-Exceptions-and-possible-reasons#javalangclasscastexception
> at com.twitter.scalding.Tool$.main(Tool.scala:152)
> at com.twitter.scalding.Tool.main(Tool.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> Caused by: java.lang.ClassCastException:
> cascading.stats.tez.util.TezTimelineClient cannot be cast to
> org.apache.tez.dag.api.client.DAGClient
> at
> cascading.stats.tez.util.TezStatsUtil.createTimelineClient(TezStatsUtil.java:142)
> at
> cascading.flow.tez.planner.Hadoop2TezFlowStepJob$1.getJobStatusClient(Hadoop2TezFlowStepJob.java:117)
> at
> cascading.flow.tez.planner.Hadoop2TezFlowStepJob$1.getJobStatusClient(Hadoop2TezFlowStepJob.java:105)
> at
> cascading.stats.tez.TezStepStats$1.getJobStatusClient(TezStepStats.java:60)
> at
> cascading.stats.tez.TezStepStats$1.getJobStatusClient(TezStepStats.java:56)
> at cascading.stats.CounterCache.cachedCounters(CounterCache.java:229)
> at cascading.stats.CounterCache.cachedCounters(CounterCache.java:187)
> at cascading.stats.CounterCache.getCountersFor(CounterCache.java:155)
> at
> cascading.stats.BaseCachedStepStats.getCountersFor(BaseCachedStepStats.java:93)
> at cascading.stats.FlowStats.getCountersFor(FlowStats.java:159)
> at com.twitter.scalding.Stats$.getAllCustomCounters(Stats.scala:93)
> at com.twitter.scalding.Job.handleStats(Job.scala:269)
> at com.twitter.scalding.Job.run(Job.scala:298)
> at com.twitter.scalding.Tool.start$1(Tool.scala:124)
> at com.twitter.scalding.Tool.run(Tool.scala:140)
> at com.twitter.scalding.Tool.run(Tool.scala:68)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at com.twitter.scalding.Tool$.main(Tool.scala:148)
> ... 7 more
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)