[
https://issues.apache.org/jira/browse/HBASE-11322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
churro morales updated HBASE-11322:
-----------------------------------
Description:
The SnapshotHFileCleaner calls the SnapshotFileCache if a particular HFile in
question is part of a snapshot.
If the HFile is not in the cache, we then refresh the cache and check again.
But the cache refresh checks to see if anything has been modified since the
last cache refresh but this logic is incorrect in certain scenarios.
The last modified time is done via this operation:
{code}
this.lastModifiedTime = Math.min(dirStatus.getModificationTime(),
tempStatus.getModificationTime());
{code}
and the check to see if the snapshot directories have been modified:
{code}
// if the snapshot directory wasn't modified since we last check, we are done
if (dirStatus.getModificationTime() <= lastModifiedTime &&
tempStatus.getModificationTime() <= lastModifiedTime) {
return;
}
{code}
Suppose the following happens:
dirStatus modified 6-1-2014
tempStatus modified 6-2-2014
lastModifiedTime = 6-1-2014
provided these two directories don't get modified again all subsequent checks
wont exit early, like they should.
In our cluster, this was a huge performance hit. The cleaner chain fell
behind, thus almost filling up dfs and our namenode heap.
Its a simple fix, instead of Math.min we use Math.max for the lastModified, I
believe that will be correct.
I'll apply a patch for you guys.
was:
In the SnapshotFileCache:
The last modified time is done via this operation:
{code}
this.lastModifiedTime = Math.min(dirStatus.getModificationTime(),
tempStatus.getModificationTime());
{code}
and the check to see if the snapshot directories have been modified:
{code}
// if the snapshot directory wasn't modified since we last check, we are done
if (dirStatus.getModificationTime() <= lastModifiedTime &&
tempStatus.getModificationTime() <= lastModifiedTime) {
return;
}
{code}
so if the dirStatus and tmpStatus are modified at different times, we will
always assume they have been modified and refresh the cache.
In our cluster, this was a huge performance hit. The cleaner chain fell
behind, thus almost filling up dfs and our namenode heap.
Its a simple fix, instead of Math.min we use Math.max for the lastModified, I
believe that will be correct.
I'll apply a patch for you guys.
> SnapshotHFileCleaner makes the wrong check for lastModified time thus causing
> too many cache refreshes
> ------------------------------------------------------------------------------------------------------
>
> Key: HBASE-11322
> URL: https://issues.apache.org/jira/browse/HBASE-11322
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.94.19
> Reporter: churro morales
> Assignee: churro morales
> Priority: Critical
> Attachments: HBASE-11322.patch
>
>
> The SnapshotHFileCleaner calls the SnapshotFileCache if a particular HFile in
> question is part of a snapshot.
> If the HFile is not in the cache, we then refresh the cache and check again.
> But the cache refresh checks to see if anything has been modified since the
> last cache refresh but this logic is incorrect in certain scenarios.
> The last modified time is done via this operation:
> {code}
> this.lastModifiedTime = Math.min(dirStatus.getModificationTime(),
> tempStatus.getModificationTime());
> {code}
> and the check to see if the snapshot directories have been modified:
> {code}
> // if the snapshot directory wasn't modified since we last check, we are done
> if (dirStatus.getModificationTime() <= lastModifiedTime &&
> tempStatus.getModificationTime() <= lastModifiedTime) {
> return;
> }
> {code}
> Suppose the following happens:
> dirStatus modified 6-1-2014
> tempStatus modified 6-2-2014
> lastModifiedTime = 6-1-2014
> provided these two directories don't get modified again all subsequent checks
> wont exit early, like they should.
> In our cluster, this was a huge performance hit. The cleaner chain fell
> behind, thus almost filling up dfs and our namenode heap.
> Its a simple fix, instead of Math.min we use Math.max for the lastModified, I
> believe that will be correct.
> I'll apply a patch for you guys.
--
This message was sent by Atlassian JIRA
(v6.2#6252)