[ 
https://issues.apache.org/jira/browse/TEZ-3369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15430055#comment-15430055
 ] 

Hitesh Shah commented on TEZ-3369:
----------------------------------

Some initial comments: 
  - the switch to timeline probably needs to belong to a common function
  - use of int taskIndex instead of String taskId would be needed
  - map the task apis to be similar to dag and vertex progress API calls. 
  - The new objects being returned should ideally be interfaces so that they 
are under more stricter control as they are user facing. The objects being sent 
over the wire from/to the AM or being read from YARN Timeline is internal 
implementation and should be hidden from the user.
  - I assume that DAGInfo is just static info that does not change based on dag 
progress, etc. It should have vertex specific info but only static info 
therefore VertexInfo should not have TaskInfo especially if TaskInfo contains 
progress related info. This would allow the static info to be cached in the dag 
client.
  - Question: how to prevent a client from DOS'ing the AM? This could involve 
either the client caching data and not polling the AM too frequently and/or the 
AM returning cached data that is refreshed on a regular basis.


> Fixing Tez's DAGClients to work with Cascading
> ----------------------------------------------
>
>                 Key: TEZ-3369
>                 URL: https://issues.apache.org/jira/browse/TEZ-3369
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Piyush Narang
>
> Hi,
> We seem to be running into issues when we try to use the newest version of 
> Tez (0.9.0-SNAPSHOT) with Cascading. The issue seems to be:
> {code}
> java.lang.ClassCastException: cascading.stats.tez.util.TezTimelineClient 
> cannot be cast to org.apache.tez.dag.api.client.DAGClient
>       at 
> cascading.stats.tez.util.TezStatsUtil.createTimelineClient(TezStatsUtil.java:142)
> {code}
> (Full stack trace at the end)
> Relevant Cascading code is:
> 1) [Cascading tries to create a TezTimelineClient and cast it to a DAGClient 
> | 
> https://github.com/Cascading/cascading/blob/3.1/cascading-hadoop2-tez-stats/src/main/java/cascading/stats/tez/util/TezStatsUtil.java#L142]
> 2) [TezTimelineClient extends from DAGClientTimelineImpl | 
> https://github.com/Cascading/cascading/blob/3.1/cascading-hadoop2-tez-stats/src/main/java/cascading/stats/tez/util/TezTimelineClient.java#L53]
> 3) [DAGClientTimelineImpl extends from DAGClientInternal | 
> https://github.com/apache/tez/blob/dacd0191b684208d71ea457ca849f2d01212bb7e/tez-api/src/main/java/org/apache/tez/dag/api/client/DAGClientTimelineImpl.java#L68]
> 4) [DAGClientInternal extends Closeable which is why things break | 
> https://github.com/apache/tez/blob/dacd0191b684208d71ea457ca849f2d01212bb7e/tez-api/src/main/java/org/apache/tez/dag/api/client/DAGClientInternal.java#L38].
> This behavior was 'broken' in this [commit | 
> https://github.com/apache/tez/commit/2af886b509015200e1c04527275474cbc771c667]
>  (release 0.8.3)
> The TezTimelineClient in Cascading seems to do two things:
> 1) DAGClient functionalities - ends up delegating to the inner DAGClient 
> object.
> 2) Retrieve stuff like vertexID, vertexChildren and vertexChild (from this 
> [interface|https://github.com/Cascading/cascading/blob/3.1/cascading-hadoop2-tez-stats/src/main/java/cascading/stats/tez/util/TimelineClient.java#L31]).
>  
> As there's no good way to get the vertexID / vertexChildren / vertexChild 
> (correct me if I'm wrong), they end up extending from the 
> DAGClientTimelineImpl which has the http client and json parsing code to 
> allow [things like this | 
> https://github.com/Cascading/cascading/blob/3.1/cascading-hadoop2-tez-stats/src/main/java/cascading/stats/tez/util/TezTimelineClient.java#L93]:
> {code}
> @Override
>   public String getVertexID( String vertexName ) throws IOException, 
> TezException
>     {
>     // the filter 'vertexName' is in the 'otherinfo' field, so it must be 
> requested, otherwise timeline server throws
>     // an NPE. to be safe, we include both fields in the result
>     String format = 
> "%s/%s?primaryFilter=%s:%s&secondaryFilter=vertexName:%s&fields=%s";
>     String url = String.format( format, baseUri, TEZ_VERTEX_ID, TEZ_DAG_ID, 
> dagId, vertexName, FILTER_BY_FIELDS );
>     JSONObject jsonRoot = getJsonRootEntity( url );
>     JSONArray entitiesNode = jsonRoot.optJSONArray( ENTITIES );
> ...
> {code}
> Some options I can think of:
> 1) Ideally these methods getVertexID / getVertexChildren / getVertexChild 
> would be part of DAGClient? Or even part of the DAGClientTimelineImpl? That 
> way the cascading code wouldn't need updating if the uri changed / json 
> format changed, it would end up being updated in these clients as well. I 
> suspect adding this to DAGClient would require more work as it'll also need 
> to be supported by the RPCClient and I don't think there are the relevant 
> protos and such available. 
> 2) A simpler fix would be to have DAGClientInternal extend DAGClient 
> (currently it just implements Closeable). This will not require any changes 
> on the Cascading side as DAGClientTimelineImpl will continue to be a 
> DAGClient. 
> Full stack trace:
> {code}
> Exception in thread "flow 
> com.twitter.data_platform.e2e_testing.jobs.parquet.E2ETestConvertThriftToParquet"
>  java.lang.ClassCastException: cascading.stats.tez.util.TezTimelineClient 
> cannot be cast to org.apache.tez.dag.api.client.DAGClient
>       at 
> cascading.stats.tez.util.TezStatsUtil.createTimelineClient(TezStatsUtil.java:142)
>       at 
> cascading.flow.tez.planner.Hadoop2TezFlowStepJob$1.getJobStatusClient(Hadoop2TezFlowStepJob.java:117)
>       at 
> cascading.flow.tez.planner.Hadoop2TezFlowStepJob$1.getJobStatusClient(Hadoop2TezFlowStepJob.java:105)
>       at 
> cascading.stats.tez.TezStepStats$1.getJobStatusClient(TezStepStats.java:60)
>       at 
> cascading.stats.tez.TezStepStats$1.getJobStatusClient(TezStepStats.java:56)
>       at cascading.stats.CounterCache.cachedCounters(CounterCache.java:229)
>       at cascading.stats.CounterCache.cachedCounters(CounterCache.java:187)
>       at cascading.stats.CounterCache.getCounterValue(CounterCache.java:167)
>       at 
> cascading.stats.BaseCachedStepStats.getCounterValue(BaseCachedStepStats.java:105)
>       at cascading.stats.FlowStats.getCounterValue(FlowStats.java:170)
>       at 
> cascading.flow.tez.Hadoop2TezFlow.getTotalSliceCPUMilliSeconds(Hadoop2TezFlow.java:303)
>       at cascading.flow.BaseFlow.run(BaseFlow.java:1287)
>       at cascading.flow.BaseFlow.access$100(BaseFlow.java:82)
>       at cascading.flow.BaseFlow$1.run(BaseFlow.java:928)
>       at java.lang.Thread.run(Thread.java:745)
> Exception in thread "main" java.lang.Throwable: If you know what exactly 
> caused this error, please consider contributing to GitHub via following link.
> https://github.com/twitter/scalding/wiki/Common-Exceptions-and-possible-reasons#javalangclasscastexception
>       at com.twitter.scalding.Tool$.main(Tool.scala:152)
>       at com.twitter.scalding.Tool.main(Tool.scala)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>       at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> Caused by: java.lang.ClassCastException: 
> cascading.stats.tez.util.TezTimelineClient cannot be cast to 
> org.apache.tez.dag.api.client.DAGClient
>       at 
> cascading.stats.tez.util.TezStatsUtil.createTimelineClient(TezStatsUtil.java:142)
>       at 
> cascading.flow.tez.planner.Hadoop2TezFlowStepJob$1.getJobStatusClient(Hadoop2TezFlowStepJob.java:117)
>       at 
> cascading.flow.tez.planner.Hadoop2TezFlowStepJob$1.getJobStatusClient(Hadoop2TezFlowStepJob.java:105)
>       at 
> cascading.stats.tez.TezStepStats$1.getJobStatusClient(TezStepStats.java:60)
>       at 
> cascading.stats.tez.TezStepStats$1.getJobStatusClient(TezStepStats.java:56)
>       at cascading.stats.CounterCache.cachedCounters(CounterCache.java:229)
>       at cascading.stats.CounterCache.cachedCounters(CounterCache.java:187)
>       at cascading.stats.CounterCache.getCountersFor(CounterCache.java:155)
>       at 
> cascading.stats.BaseCachedStepStats.getCountersFor(BaseCachedStepStats.java:93)
>       at cascading.stats.FlowStats.getCountersFor(FlowStats.java:159)
>       at com.twitter.scalding.Stats$.getAllCustomCounters(Stats.scala:93)
>       at com.twitter.scalding.Job.handleStats(Job.scala:269)
>       at com.twitter.scalding.Job.run(Job.scala:298)
>       at com.twitter.scalding.Tool.start$1(Tool.scala:124)
>       at com.twitter.scalding.Tool.run(Tool.scala:140)
>       at com.twitter.scalding.Tool.run(Tool.scala:68)
>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>       at com.twitter.scalding.Tool$.main(Tool.scala:148)
>       ... 7 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to