[ 
https://issues.apache.org/jira/browse/HDFS-1597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12986762#action_12986762
 ] 

Todd Lipcon commented on HDFS-1597:
-----------------------------------

The race is the following:

||Thread A||Thread B||
|mkdirs() | - |
| take FSN lock | - |
| ..logEdit() | - |
| drop FSN lock | - |
| - | enterSafeMode() |
| - | saveNamespace() |
| - | ..logSyncAll() |
| - | ..editLog.close() |
| logSync() | - |

In this case, because Thread A's transaction has already been synced in 
logSyncAll, it doesn't actually have any work to sync - i.e it got batched. 
Accordingly, it's fine that the edit log is closed. But, the assertion comes 
before the check that the sync was already batched, so it fires.

This causes occasional failures of TestEditLog on one of our hudson builds now 
that assertions are enabled.

> Misplaced assertion in FSEditLog.logSync
> ----------------------------------------
>
>                 Key: HDFS-1597
>                 URL: https://issues.apache.org/jira/browse/HDFS-1597
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.22.0
>
>
> The top of FSEditLog.logSync has the following assertion:
> {code}
>         assert editStreams.size() > 0 : "no editlog streams";
> {code}
> which should actually come after checking to see if the sync was already 
> batched in by another thread.
> Will describe the race in a comment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to