[jira] [Commented] (HIVE-20512) Improve record and memory usage logging in SparkRecordHandler

Sahil Takiar (JIRA) Mon, 05 Nov 2018 15:41:29 -0800


    [ 
https://issues.apache.org/jira/browse/HIVE-20512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16675889#comment-16675889
 ]


Sahil Takiar commented on HIVE-20512:
-------------------------------------

I'm not sure why {{awaitTermination}} would be causing the tests to timeout. Do 
they hang locally? If not, it could have just been a temporary test infra 
issue. The problem with calling {{shutdownNow}} directly is that is cancels any 
in progress tasks by interrupting any in progress threads. This can lead to 
spurious errors in the task logs, which can be confusing. It's generally 
recommended to follow the shutdown pattern outlined in 
[https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ExecutorService.html]

> Improve record and memory usage logging in SparkRecordHandler
> -------------------------------------------------------------
>
>                 Key: HIVE-20512
>                 URL: https://issues.apache.org/jira/browse/HIVE-20512
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Sahil Takiar
>            Assignee: Bharathkrishna Guruvayoor Murali
>            Priority: Major
>         Attachments: HIVE-20512.1.patch, HIVE-20512.2.patch, 
> HIVE-20512.3.patch, HIVE-20512.4.patch, HIVE-20512.5.patch, HIVE-20512.6.patch
>
>
> We currently log memory usage and # of records processed in Spark tasks, but 
> we should improve the methodology for how frequently we log this info. 
> Currently we use the following code:
> {code:java}
> private long getNextLogThreshold(long currentThreshold) {
>     // A very simple counter to keep track of number of rows processed by the
>     // reducer. It dumps
>     // every 1 million times, and quickly before that
>     if (currentThreshold >= 1000000) {
>       return currentThreshold + 1000000;
>     }
>     return 10 * currentThreshold;
>   }
> {code}
> The issue is that after a while, the increase by 10x factor means that you 
> have to process a huge # of records before this gets triggered.
> A better approach would be to log this info at a given interval. This would 
> help in debugging tasks that are seemingly hung.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20512) Improve record and memory usage logging in SparkRecordHandler

Reply via email to