[
https://issues.apache.org/jira/browse/TEZ-3351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15383038#comment-15383038
]
Hitesh Shah commented on TEZ-3351:
----------------------------------
General questions:
- Is ATSHistoryLogLevel sufficient to control the amount of data pushed to
ATS? What if dags have 5000 counters each? Would that be handled easily by ATS?
Do we need a knob to stop publishing counters at certain levels? Additionally,
why do we need so many logging levels?
Comments on the patch:
- ATSHistoryLogLevel sounds ATS specific. ATS is actually a plugin and
therefore it seems wrong to add a plugin specific API to the generic APIs of
TezClient and DAGClient. A generic API to set logging level makes sense though.
- "private static final int DAGS_PER_GROUP = 1000;" - why a 1000? What kind
of perf impact will we see if we scale this value upwards or downwards? Can
this be changed dynamically per app?
- Why does a TIMELINE_GROUPID related vars belong to TezDAGId?
{code}
} else {
367 // dagId does not exist, lets check at AM level.
368 if
(!amAtsHistoryLogLevel.shouldLog(eventType.getAtsHistoryLogLevel())) {
369 return false;
370 }
371 }
{code}
- this approach is incorrect for recovery cases. If the dag object exists, it
should be easy enough to retrieve the log level and add it back to the map.
This does raise an issue for cache misses though.
- testATSLogLevelNone() - not sure how this is actually testing that there is
no data being generated by the history logger?
- there needs to be additional testing for a level that does not match
ALL/NONE
- ATSVHistoryLoggingService has not been changed?
- "Find better way to wait for the events to be drained." - timing based
tests have a tendency to be flaky. Would be good to change this to more
definitive.
> Handle ATS performance issues for Hive-LLAP.
> --------------------------------------------
>
> Key: TEZ-3351
> URL: https://issues.apache.org/jira/browse/TEZ-3351
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Harish Jaiprakash
> Assignee: Harish Jaiprakash
> Attachments: TEZ-3351.01.patch, TEZ-3351.WIP.01.patch
>
>
> With Hive-LLAP, we have subsecond queries and hence can run lots of queries
> in a small interval. This can overload ATS with lot of events and create
> large number of file hdfs. This jira is to reduce performance impact of ATS
> logging by Tez.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)