[jira] [Commented] (TEZ-3650) Improve performance of FetchStatsLogger#logIndividualFetchComplete

Rajesh Balamohan (JIRA) Sun, 12 Mar 2017 16:51:25 -0700

    [ 
https://issues.apache.org/jira/browse/TEZ-3650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15906742#comment-15906742
 ]


Rajesh Balamohan commented on TEZ-3650:
---------------------------------------

LGTM. +1. Thanks [~jeagles].

Though InputAttemptIdentifier#toString would internally be using StringBuilder 
(JVM byte code represents this as StringBuilder), it could still show up in 
profiler due to concatenation/toString.  I agree that this 
InputAttemptIdentifier#toString could have been delayed as provided in current 
the patch.

For reporting fetch rate (MB/s), we may need to consider nanoTime instead of 
millis. That can be a separate JIRA. Pasting an example log here.

{noformat}
2017-03-12 19:24:52,917 [INFO] [Fetcher_B {Reducer_16} #0] 
|ShuffleManager.fetch|: Completed fetch for attempt: {0, 0, 
attempt_1488231257387_2078_1_10_000000_0_10009} to MEMORY, csize=10884, 
dsize=10867, EndTime=1489361092917, TimeTaken=1, Rate=10.37 MB/s

2017-03-12 19:25:09,833 [INFO] [Fetcher_B {Reducer_16} #0] 
|ShuffleManager.fetch|: Completed fetch for attempt: {0, 0, 
attempt_1488231257387_2078_1_10_000000_0_10009} to MEMORY, csize=10884, 
dsize=10867, EndTime=1489361109833, TimeTaken=0, Rate=0.00 MB/s
{noformat}

Though the second log statement fetched the data a lot faster (both have 
csize=10884), it is reported as 0.00 MB/s in the second log statement as it is 
in millisecond. Had it been nanoTime, we would get better accuracy.

> Improve performance of FetchStatsLogger#logIndividualFetchComplete
> ------------------------------------------------------------------
>
>                 Key: TEZ-3650
>                 URL: https://issues.apache.org/jira/browse/TEZ-3650
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Jonathan Eagles
>            Assignee: Jonathan Eagles
>         Attachments: TEZ-3650.1.patch, TEZ-3650.2.patch
>
>
> The cost of logging the fetch completed statement is dominated by two main 
> factors 1) Formatting the download rate and 2) Minor String concatenation 
> that isn't getting optimized.
> In this jira I propose a new Formatter that is optimized by implementing the 
> StringBuilder#append(long) algorithm, but allows for formatting and reuse of 
> StringBuilder.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (TEZ-3650) Improve performance of FetchStatsLogger#logIndividualFetchComplete

Reply via email to