[ 
https://issues.apache.org/jira/browse/HDFS-8771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14902285#comment-14902285
 ] 

Vinayakumar B commented on HDFS-8771:
-------------------------------------

I think its a good idea to make purge asynchronous to unblock write requests.

Some comments about the patch.
1. {{void purgeDataOlderThan(final long minTxIdToKeep) throws IOException {}}
Here no exception will be thrown from this method now, so now can remove 
{{throws}}.
2. {{setUncaughtExceptionHandler(UncaughtExceptionHandlers.systemExit())}}
I think, shutting down entire JN on IOException during purge may not be good. 
During purge only call which results in IOE is {{FileUtil.listFiles(dir)}}, 
which might be due to disk error. Since this exception cannot be propogated 
back to NN, I feel it would be better to handle inside {{call()}} and log a 
WARN. Let further synchronous write requests handle the IOE as required. For 
any other exceptions let JN shutdown, its okay.

[~andrew.wang] / [~jingzhao], do you want to take a look here. ?

> If IPCLoggerChannel#purgeLogsOlderThan takes too long, Namenode could not 
> send another RPC calls to Journalnodes
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-8771
>                 URL: https://issues.apache.org/jira/browse/HDFS-8771
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Takuya Fukudome
>            Assignee: Kanaka Kumar Avvaru
>         Attachments: HDFS-8771-01.patch, HDFS-8771-02.patch, 
> HDFS-8771-03.patch
>
>
> In our cluster, edits has became huge(about 50GB) accidentally and our 
> Jounalnodes' disks were busy, therefore {{purgeLogsOlderThan}} took more than 
> 30secs. If {{IPCLoggerChannel#purgeLogsOlderThan}} takes too much time, 
> Namenode couldn't send other RPC calls to Journalnodes because 
> {{o.a.h.hdfs.qjournal.client.IPCLoggerChannel}}'s executor is single thread. 
> It will cause namenode shutting down.
> I think IPCLoggerChannel#purgeLogsOlderThan should not block other RPC calls 
> like sendEdits.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to