[jira] [Updated] (HDFS-3950) QJM: misc TODO cleanup, improved log messages, etc

Todd Lipcon (JIRA) Tue, 18 Sep 2012 14:31:13 -0700

     [ 
https://issues.apache.org/jira/browse/HDFS-3950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Todd Lipcon updated HDFS-3950:
------------------------------

    Attachment: hdfs-3950.txt

- Removes hardcoded timeout for attaining a quorum to write transactions. Now 
configurable (default still 20sec)
- Change stringification of QuorumJournalManager so that the web UI readout 
doesn't end up so wide. We used to print the URI, which was very wide. Now 
there is a ", "-separated list of addresses, so it's able to wrap to multiple 
lines and display nicer. Had to update a unit test or two for this.
- Change the buffer capacity for the QuorumOutputStream to match the behavior 
of EditLogFileOutputStream (ie respects FSEditLog.setOutputBufferCapacity())

- Removed TODO:
{code}
-    // TODO: check that md5s match up between any "tied" logs
{code}

We removed the md5sum field in HDFS-3943. When we add it back, we can add a 
sanity check like this.

- Removed a couple TODOs which I replaced with comments rationalizing why the 
current behavior does in fact work.

- Reduced verbose logging during newEpoch(). The verbose logging of newEpoch() 
responses is now at DEBUG level, with a less verbose one at INFO level.

- Removed a bunch of unused imports in various files.

- Replace use of deprecated RPC.getServer with the new Builder interface from 
Common.

- Address some TODOs in {{Journal.checkRequest}}. These are the most 
interesting non-trivial changes from this patch:
-- Maintains the current IPC serial number and performs sanity checks that they 
only increase in a given epoch. This is defensive against bugs in the IPC 
layer, and also would defend against a potential bug where multiple writers got 
assigned the same epoch.
-- Whenever we get an RPC from a new epoch (higher than lastPromisedEpoch), we 
treat that as an explicit "promise" not to accept lower ones. This helps 
tighten our sanity checks - we used to only assign lastPromisedEpoch as part of 
the {{newEpoch()}} change, and strictly that's all that's necessary. But 
re-assigning it on any higher-epoched RPC is extra-defensive.

- Include the client IP address in some of the more important INFO messages.

- Remove stale TODO:
{code}
-    // TODO: right now, a recovery of a segment when the log is
-    // completely emtpy (ie startLogSegment() but no txns)
-    // will fail this assertion here, since endTxId < startTxId
{code}
There are lots of tests for this circumstance now - it's been long since fixed.

- Adds a few new sanity checks that I thought of while reviewing the code.

- Adds a fault injection point between where a logger downloads a log segment 
and then persists the metadata about that log segment. I had a hunch there 
might be a bug here, but it is successfully passing the tests, so I think it 
turned out to not be a problem. The new fault injection point uses the same 
strategy as CheckpointFaultInjector.

- Improves {{PersistentLongFile}} to not re-write the file when the value has 
not changed.

I ran this through my cluster fault injection test and it passed. I also ran 
findbugs and there are no issues found. Ran the full unit test suite for 
qjournal and it passed.
                
> QJM: misc TODO cleanup, improved log messages, etc
> --------------------------------------------------
>
>                 Key: HDFS-3950
>                 URL: https://issues.apache.org/jira/browse/HDFS-3950
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ha
>    Affects Versions: QuorumJournalManager (HDFS-3077)
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Minor
>         Attachments: hdfs-3950.txt
>
>
> General JIRA for a bunch of miscellaneous clean-up in the QJM branch:
> - fix most remaining TODOs
> - improve some log/error messages
> - add some more sanity checks where appropriate
> - address any findbugs that might have crept into branch

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3950) QJM: misc TODO cleanup, improved log messages, etc

Reply via email to