[ 
https://issues.apache.org/jira/browse/HDFS-13864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Bota updated HDFS-13864:
------------------------------
    Description: 
In HDFS-13672 (clearCorruptLazyPersistFiles could crash NameNode) we agreed 
that the current implementation could be changed, but not in a way that it was 
addressed in that issue.
This jira is a follow-up for HDFS-13672.

As a workaround, we can disable the scrubber interval when debugging. In the 
real world/customer environments, there are no cases when there are so many 
corrupted lazy persist files.

We agreed that 
* holding the lock for a long time is an anti-pattern
* the common case here is that there are zero lazy persist files, a better 
(though different) change would be to skip running this scrubber entirely if 
there aren't any lazy persist files

We had the following ideas:
* create a service where you can iterate through a list of elements with a 
gained writeLock - and each element can be run through a lambda function.
* What we need here is a tail iterator that starts at the last processed 
element.
* Open question: should we disable the {{clearCorruptLazyPersistFiles}} by 
default? 


  was:
In HDFS-13672 we agreed that the current implementation could be changed, but 
not in a way that it was addressed in that issue.
This jira is a follow-up for HDFS-13672.

As a workaround, we can disable the scrubber interval when debugging. In the 
real world/customer environments, there are no cases when there are so many 
corrupted lazy persist files.

We agreed that 
* holding the lock for a long time is an anti-pattern
* the common case here is that there are zero lazy persist files, a better 
(though different) change would be to skip running this scrubber entirely if 
there aren't any lazy persist files

We had the following ideas:
* create a service where you can iterate through a list of elements with a 
gained writeLock - and each element can be run through a lambda function.
* What we need here is a tail iterator that starts at the last processed 
element.
* Open question: should we disable the {{clearCorruptLazyPersistFiles}} by 
default? 



> Service for FSNamesystem#clearCorruptLazyPersistFiles to iterate with 
> writeLock
> -------------------------------------------------------------------------------
>
>                 Key: HDFS-13864
>                 URL: https://issues.apache.org/jira/browse/HDFS-13864
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Gabor Bota
>            Priority: Minor
>
> In HDFS-13672 (clearCorruptLazyPersistFiles could crash NameNode) we agreed 
> that the current implementation could be changed, but not in a way that it 
> was addressed in that issue.
> This jira is a follow-up for HDFS-13672.
> As a workaround, we can disable the scrubber interval when debugging. In the 
> real world/customer environments, there are no cases when there are so many 
> corrupted lazy persist files.
> We agreed that 
> * holding the lock for a long time is an anti-pattern
> * the common case here is that there are zero lazy persist files, a better 
> (though different) change would be to skip running this scrubber entirely if 
> there aren't any lazy persist files
> We had the following ideas:
> * create a service where you can iterate through a list of elements with a 
> gained writeLock - and each element can be run through a lambda function.
> * What we need here is a tail iterator that starts at the last processed 
> element.
> * Open question: should we disable the {{clearCorruptLazyPersistFiles}} by 
> default? 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to