Hi Todd,
Thank you, this is tremendously valuable input!  I'll have to look in detail
at each of these ten jiras,
and will get back to the list with more info shortly.
--Matt

On Fri, Sep 2, 2011 at 1:03 PM, Todd Lipcon <[email protected]> wrote:

> The following other JIRAs have been committed in CDH for 18 months or
> so, for the purpose of HBase. You may want to consider backporting
> them as well - many were never committed to 0.20-append due to lack of
> reviews by HDFS committers at the time.
>
>    HDFS-1056. Fix possible multinode deadlocks during block recovery
> when using ephemeral dataxceiv
>
>    Description: Fixes the logic by which datanodes identify local RPC
> targets
>                 during block recovery for the case when the datanode
>                 is configured with an ephemeral data transceiver port.
>    Reason: Potential internode deadlock for clusters using ephemeral ports
>
>
>    HADOOP-6722. Workaround a TCP spec quirk by not allowing
> NetUtils.connect to connect to itself
>
>    Description: TCP's ephemeral port assignment results in the possibility
>                 that a client can connect back to its own outgoing socket,
>                 resulting in failed RPCs or datanode transfers.
>    Reason: Fixes intermittent errors in cluster testing with ephemeral
>            IPC/transceiver ports on datanodes.
>
>    HDFS-1122. Don't allow client verification to prematurely add
> inprogress blocks to DataBlockScanner
>
>    Description: When a client reads a block that is also open for writing,
>                 it should not add it to the datanode block scanner.
>                 If it does, the block scanner can incorrectly mark the
>                 block as corrupt, causing data loss.
>    Reason: Potential dataloss with concurrent writer-reader case.
>
>    HDFS-1248. Miscellaneous cleanup and improvements on 0.20 append branch
>
>    Description: Miscellaneous code cleanup and logging changes, including:
>     - Slight cleanup to recoverFile() function in TestFileAppend4
>     - Improve error messages on OP_READ_BLOCK
>     - Some comment cleanup in FSNamesystem
>     - Remove toInodeUnderConstruction (was not used)
>     - Add some checks for null blocks in FSNamesystem to avoid a possible
> NPE
>     - Only log "inconsistent size" warnings at WARN level for
> non-under-construction blocks.
>     - Redundant addStoredBlock calls are also not worthy of WARN level
>     - Add some extra information to a warning in ReplicationTargetChooser
>    Reason: Improves diagnosis of error cases and clarity of code
>
>
>    HDFS-1242. Add unit test for the appendFile race condition /
> synchronization bug fixed in HDFS-142
>
>    Reason: Test coverage for previously applied patch.
>
>    HDFS-1218. Replicas that are recovered during DN startup should
> not be allowed to truncate better replicas.
>
>    Description: If a datanode loses power and then recovers, its replicas
>                 may be truncated due to the recovery of the local FS
>                 journal. This patch ensures that a replica truncated by
>                 a power loss does not truncate the block on HDFS.
>    Reason: Potential dataloss bug uncovered by power failure simulation
>
>    HDFS-915. Write pipeline hangs for too long when ResponseProcessor
> hits timeout
>
>    Description: Previously, the write pipeline would hang for the entire
> write
>                 timeout when it encountered a read timeout (eg due to a
>                 network connectivity issue). This patch interrupts the
> writing
>                 thread when a read error occurs.
>    Reason: Faster recovery from pipeline failure for HBase and other
>            interactive applications.
>
>
>    HDFS-1186. Writers should be interrupted when recovery is started,
> not when it's completed.
>
>    Description: When the write pipeline recovery process is initiated, this
>                 interrupts any concurrent writers to the block under
> recovery.
>                 This prevents a case where some edits may be lost if the
>                 writer has lost its lease but continues to write (eg due to
>                 a garbage collection pause)
>    Reason: Fixes a potential dataloss bug
>
>
> commit a960eea40dbd6a4e87072bdf73ac3b62e772f70a
> Author: Todd Lipcon <[email protected]>
> Date:   Sun Jun 13 23:02:38 2010 -0700
>
>    HDFS-1197. Received blocks should not be added to block map
> prematurely for under construction files
>
>    Description: Fixes a possible dataloss scenario when using append() on
>                 real-life clusters. Also augments unit tests to uncover
>                 similar bugs in the future by simulating latency when
>                 reporting blocks received by datanodes.
>    Reason: Append support dataloss bug
>    Author: Todd Lipcon
>
>
>    HDFS-1260. tryUpdateBlock should do validation before renaming meta file
>
>    Description: Solves bug where block became inaccessible in certain
> failure
>                 conditions (particularly network partitions). Observed
> under
>                 HBase workload at user site.
>    Reason: Potential loss of syunced data when write pipeline fails
>
>
> On Fri, Sep 2, 2011 at 11:20 AM, Suresh Srinivas <[email protected]>
> wrote:
> > I also propose following jiras, which are non append related bug fixes
> from
> > 0.20-append branch:
> >
> >   - HDFS-1164. TestHdfsProxy is failing.
> >   - HDFS-1211. Block receiver should not log "rewind" packets at INFO
> >   level.
> >   - HDFS-1118. Fix socketleak on DFSClient.
> >   - HDFS-1210. DFSClient should log exception when block recovery fails.
> >   - HDFS-606. Fix ConcurrentModificationException in
> >   invalidateCorruptReplicas.
> >   - HDFS-561. Fix write pipeline READ_TIMEOUT.
> >   - HDFS-1202.  DataBlockScanner throws NPE when updated before
> >   initialized.
> >
> > Risk Level:
> > These are useful bugfixes from append branch and are not big changes to
> the
> > code base.
> >
> > These jiras have already been merged into 0.20-security branch.
> >
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Reply via email to