Hi Matt,
My colleagues with experience running 0.20.203 (as well as many previous
releases of Hadoop) in Yahoo!'s production environment are requesting that the
following items be included as high priority sustaining improvements in
0.20.205. Rather than contributors sending in several separate requests to
this mailing list, this email aggregates contributions from the following
individuals: Daryn Sharp, Jeffrey Naisbitt, Kihwal Lee, Sherry Chen, Thomas
Graves, Bharath Mundlapudi, Robert Joseph Evans, Anupam Seth, Eric Payne, John
George.
Recommendations and suggestions for this list of jiras came from folks with
significant experience working with large scale Hadoop clusters within Yahoo!
production environments, including Service Engineering teams, Quality
Engineering teams, Solutions Engineering teams, and Development teams.
Notes on the items listed below:
* All the Jiras listed with the exception of HADOOP-7510, MAPREDUCE-2764,
MAPREDUCE-2915, and HDFS-2257 have been committed to 0.20-security. These
remaining four jiras are in-progress and should wrap up over the next few days.
* All of the jiras listed have been fixed in trunk with the following
exceptions:
* The four jiras listed above which are still being worked
* MAPREDUCE-2780 - Similar to previous bullet. In progress now.
* MAPREDUCE-2324 - Has a strong interaction with MR279 so filed
MAPREDUCE-2723 to make sure this is handled correctly in yarn
* MAPREDUCE-2729, MAPREDUCE-2621 - Don't make sense after integration of
MR279
Thank you for considering this list of Jiras for inclusion in 0.20.205.
Nathan Roberts
====
MAPREDUCE-2489 - Jobsplits with random hostnames can make the queue unusable
Justification: A broken job that is issuing random hostnames to the job tracker
can hang up a queue and severely impact the performance of the job tracker.
Risk: Low. Change involves a simple check for obviously malformed hostnames.
MAPREDUCE-2852 - Remove YDH Bug 2854624 from code comments
Justification: Comment change only
Risk: Low
HADOOP-7472 - RPC client should deal with the IP address changes
Justification: If the IP address of a namenode is changed, all clients must be
restarted. This can be very expensive and difficult to execute when many of the
clients are not within the cluster-proper. e.g. distcp
Risk: Low. If an address change is suspected, the code now performs an
additional lookup and updates the address. Does not affect normal path.
MAPREDUCE-2729 - Reducers are always counted having pending tasks even if they
can't be scheduled yet because not enough of their mappers have completed
Justification: reducer slots are not being properly allocated when reducers are
waiting on map tasks to finish, causing situations where a queue can be
significantly under utilized. In grids where queues are configured with
relatively tight constraints, this can result in substantial throughput
degradation when this condition arises.
Risk: Medium/Low. No change to the scheduler can be taken lightly so in those
terms it's medium. However, the change itself is straighforward and the experts
agree it was a bug.
MAPREDUCE-2705 - tasks localized and launched serially by TaskLauncher -
causing other tasks to be delayed
Justification: Large localization processes lock up task launcher for
potentially very long periods of time. This can result in significant delays
for other tasks assigned to the same compute node.
Risk: Low. Localization is performed in a separate thread but overall flow for
a particular task remains unchanged.
MAPREDUCE-2651 - Race condition in Linux Task Controller for job log directory
creation
Justification: Tasks can fail because of a race to create the job log directory.
Risk: Low. Deals with EEXIST more consistently.
MAPREDUCE-2650 - back-port MAPREDUCE-2238 to 0.20-security
Justification: Permission handling within localization causes races and can
leave directories with broken permissions. Adversely affects test
reproducibility.
Risk: Low. Fix has been in trunk and 22 for several months.
MAPREDUCE-2621 - TestCapacityScheduler fails with Queue q1 does not exist
Justification: Hudson unit test failures
Risk: Low. Changes just create an explicit association between the QueueManager
and JT.
MAPREDUCE-2494 - Make the distributed cache delete entires using LRU priority
Justification: Some regularly recurring jobs require large distributed cache
contents. The current scheme deletes these contents when the distributed cache
fills up. The penalty for localizing this type of job is a recurring penalty.
Risk: Low/Medium - Currently eviction is all or nothing. This change just
orders the eviction and doesn't do it all at once. All of the races dealing
with eviction were already dealt with in the code so no additional risk from
that standpoint.
MAPREDUCE-2324 - Job should fail if a reduce task can't be scheduled anywhere
Justification: Jobs can get stuck in limbo emitting tons of messages to the
logs about not being able to schedule the reduce. It's best to either just
attempt the reduce and let it fail, or put in more sophisticated logic to
attempt to fail these jobs before attempting the reduce at all.
Risk: Low. Change now just removes the check which tried to prevent this. So,
the job will be attempted and will just fail through the normal course.
MAPREDUCE-2187 - map tasks timeout during sorting
Justification: No progress is reported during merge sort so if this phase is
takes too long, the tasks can timeout and fail.
Risk: Low. Adds new progress report point during merge sort.
HDFS-2202 - Changes to balancer bandwidth should not require datanode restart.
Justification: There are times when operations needs to either speed up or slow
down the balancer bandwidth. The system should support doing so without
restarting the datanodes.
Risk: Low/Medium - If feature is not used, code paths are the same.
HDFS-1836 - Thousand of CLOSE_WAIT socket
Justification: Clients can chew up socket connections by not closing down
correctly.
Risk: Low.
HADOOP-7432 - Back-port HADOOP-7110 to 0.20-security
Justification: Fixes build/UT failures due to racey chmod and improve
performance by using JNI chmod rather than forking.
Risk: Low. Backport of fix for 22 from Todd Lipconn.
HADOOP-7314 - Add support for throwing UnknownHostException when a host doesn't
resolve
Justification: Tied to MAPREDUCE-2489. Same justification.
Risk: Same risk as MAPREDUCE-2489
MAPREDUCE-2764 - Fix renewal of dfs delegation tokens
Justification: Long running jobs like distcp may repeatedly fail to renew
delegation tokens even after an intermittent error has been corrected. The
repeated failures can overwhelm the job tracker causing the entire grid to have
difficulty.
Risk: Medium risk. Requires a low-level change to the tokens to include enough
information so that the token can be renewed later.
HDFS-2257 - HftpFilesysystem should implement GetDelegationTokens
Justification: Required for MAPREDUCE-2764
Risk: Medium - See MAPREDUCE-2764
MAPREDUCE-2780 - MAPREDUCE-2764 Standardize the value of token service
Justification: Required for MAPREDUCE-2764
Risk: Low. Creates a setService method rather than having all token producers
do this themselves. No change to actual tokens.
HADOOP-7510 - Tokens should use original hostname provided instead of ip
Justification: Need this in order to support namenode changing IP address.
Otherwise as soon as next task tries to look something up in token cache using
ip, it will fail to find the proper token and then fail to execute.
Risk: Medium risk. Requires a change to the information maintained in the token.
HADOOP-7539 - merge hadoop archive goodness from trunk to 0.20
Justification: HAR support regressed somewhat when merging to 0.20.203. This
jira brings HAR support back to what's in trunk.
Risk: Low risk. Doesn't affect mainline HDFS/MAPREDUCE,
HADOOP-6889 - Make RPC to have an option to timeout
Justification: Clients can hang when issuing RPCs to troubled datanodes because
there is no RPC timeout. Has been pulled into 0.22, 0.20.append.
Risk: Low. Running in 0.20.appemd, trunk and 22. Fixed 12 months ago.
MAPREDUCE-2915 LinuxTaskController does not work when
JniBasedUnixGroupsNetgroupMapping or JniBasedUnixGroupsMapping is enabled
Justification: If one does not use the JNI versions of these methods, the
namenode and jobtracker frequently have to fork. Especially in the case of the
namenode, this can cause many seconds of unavailability due to the time it
takes the linux kernel to copy hundreds of MB of page tables and exec a new
process.
Risk: Low. Fix adds a missing argument when launching the linuxtaskcontroller
(path to native libraries)