[
https://issues.apache.org/jira/browse/TEZ-2775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738508#comment-14738508
]
Rajesh Balamohan commented on TEZ-2775:
---------------------------------------
Fetcher/FetcherOrderedGrouped - should we retain the decomp, compressed size,
type information?. That has helped in couple of scenarios when debugging.
MergeManager - Can we retain the logging in closeInMemoryFile. It has helped in
couple of debugging related to memory.
MRInput/MROutput/MultiMRInput - space needed in "newmapreduce" or is that
intentional?
PipelinedSorter:
- in flush(), indexcache empty check is due to TEZ-2440. This is completely
harmeless and can be removed. If we do not have this check, and if the task
gets killed in the middle, it can throw NPE leading to distraction when
debugging.
ShuffleInputEventHandlerImpl:
- numObsoletionEvenets - spelling?
- Moving "DME srcIdx" would help a lot in drastically reducing the task log
size. But the following code can cause confusion
{noformat}
if (numDmeEvents.get() + numObsoletionEvenets.get() >
nextToLogEventCount.get()) {
LOG.info(inputContext.getSourceVertexName() + ": "
+ "numDmeEventsSeen=" + numDmeEvents.get()
+ ", numDmeEventsSeenWithNoData=" + numDmeEventsNoData.get()
+ ", numObsoletionEventsSeen=" + numObsoletionEvenets.get());
// Log every 50 events seen.
nextToLogEventCount.addAndGet(50);
}
{noformat}
If there are only 10 or 30 events, it might only print the first time?. Would
having a method like getProgress() or something like that in
ShuffleEventHandler, where this statement can be logged to provide more
accurate number?. This can be periodically called as well as from
UnorderedKVInput.close().
ShuffleInputEventHandlerOrderedGrouped:
- Similar comments as ShuffleInputEventHandlerImpl for log statement in
handleEvent().
ShuffleManager:
- Can "Created Fetcher for host:" be retained?
- logProgress(); //can be used for mining later. Would printing it only for 50
times might make it hard to interpret?
ShuffleScheduler:
- Same as ShuffleManager for logProgress().
ShuffleUtils:logIndividualFetchComplete() - Changes can break the perf analysis
tool for different versions?
> Reduce logging in runtime components
> ------------------------------------
>
> Key: TEZ-2775
> URL: https://issues.apache.org/jira/browse/TEZ-2775
> Project: Apache Tez
> Issue Type: Sub-task
> Reporter: Siddharth Seth
> Assignee: Siddharth Seth
> Attachments: TEZ-2775.1.txt
>
>
> Specifically Shuffle, which logs a lot for each event being processed and
> data being fetched.
> Also PipelinedShuffle is fairly noisy - some of the information from here
> could be consolidated.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)