[
https://issues.apache.org/jira/browse/HDFS-13864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gabor Bota updated HDFS-13864:
------------------------------
Description:
In HDFS-13672 (clearCorruptLazyPersistFiles could crash NameNode) we agreed
that the current implementation could be changed, but not in a way that it was
addressed in that issue.
This jira is a follow-up for HDFS-13672.
As a workaround, we can disable the scrubber interval when debugging. In the
real world/customer environments, there are no cases when there are so many
corrupted lazy persist files.
We agreed that
* holding the lock for a long time is an anti-pattern
* the common case here is that there are zero lazy persist files, a better
(though different) change would be to skip running this scrubber entirely if
there aren't any lazy persist files
We had the following ideas:
* create a service where you can iterate through a list of elements with a
gained writeLock - and each element can be run through a lambda function.
* What we need here is a tail iterator that starts at the last processed
element.
* Open question: should we disable the {{clearCorruptLazyPersistFiles}} by
default?
was:
In HDFS-13672 we agreed that the current implementation could be changed, but
not in a way that it was addressed in that issue.
This jira is a follow-up for HDFS-13672.
As a workaround, we can disable the scrubber interval when debugging. In the
real world/customer environments, there are no cases when there are so many
corrupted lazy persist files.
We agreed that
* holding the lock for a long time is an anti-pattern
* the common case here is that there are zero lazy persist files, a better
(though different) change would be to skip running this scrubber entirely if
there aren't any lazy persist files
We had the following ideas:
* create a service where you can iterate through a list of elements with a
gained writeLock - and each element can be run through a lambda function.
* What we need here is a tail iterator that starts at the last processed
element.
* Open question: should we disable the {{clearCorruptLazyPersistFiles}} by
default?
> Service for FSNamesystem#clearCorruptLazyPersistFiles to iterate with
> writeLock
> -------------------------------------------------------------------------------
>
> Key: HDFS-13864
> URL: https://issues.apache.org/jira/browse/HDFS-13864
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Gabor Bota
> Priority: Minor
>
> In HDFS-13672 (clearCorruptLazyPersistFiles could crash NameNode) we agreed
> that the current implementation could be changed, but not in a way that it
> was addressed in that issue.
> This jira is a follow-up for HDFS-13672.
> As a workaround, we can disable the scrubber interval when debugging. In the
> real world/customer environments, there are no cases when there are so many
> corrupted lazy persist files.
> We agreed that
> * holding the lock for a long time is an anti-pattern
> * the common case here is that there are zero lazy persist files, a better
> (though different) change would be to skip running this scrubber entirely if
> there aren't any lazy persist files
> We had the following ideas:
> * create a service where you can iterate through a list of elements with a
> gained writeLock - and each element can be run through a lambda function.
> * What we need here is a tail iterator that starts at the last processed
> element.
> * Open question: should we disable the {{clearCorruptLazyPersistFiles}} by
> default?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]