[ 
https://issues.apache.org/jira/browse/HDFS-9016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254255#comment-15254255
 ] 

Allen Wittenauer commented on HDFS-9016:
----------------------------------------

This is basically Hadoop's operability problems coming to the forefront:

* The compatibility guidelines don't offer any real out for CLI output that 
actually needs to change based upon the implementation.  So no, technically, a 
special flag like '-replicadetails' would not be magically immune.  Once the 
output is in a released version, it's fixed. If the output changes based upon 
how the system is configured, there is no hints anywhere visible that this is 
going to occur. The compatibility guidelines are the ONLY thread by which 
operation teams are holding on and every time we ignore them, all hell breaks 
loose.  (Of course, a lot of the people who work on the code don't realize this 
because they have no direct lines of communication or really pay attention that 
much when an ops person does point out that the world broke.  "Feature 
expediency" takes over for common sense just way too much.  HDFS rolling 
upgrade is a great example--it actually caused data loss in certain instances 
because someone thought it was a great idea to turn a heavily depended upon NN 
flag to be a no-op with a success exit code.) 

* We don't build that many interfaces that can actually be used by the 
scripting languages (perl, python, ruby, etc) leaving stdout as the only way 
the vast majority of ops people are going to be able to process information.  
While the JMX->REST hook was a great help, it's read only and still doesn't 
expose vital information (fsck being the worst offender, because frankly, it's 
doing way too much.  Why does it have to be literally the only source for block 
level information?).  

To me, things like the storagepolicy code should have taken on the PMC and 
tried to revamp the compatibility guidelines to specifically spell out that 
command line arguments that generate output need to also specify stability in 
their accompanying documentation.  Buried in a javadoc is useless.  Unless 
people are writing code, users don't see that information. See: metrics, rack 
awareness, and a host of other bits that have had real documentation written 
over the past 2 years. All of that information was previously done through word 
of mouth.

That said, I know what the outcome of this JIRA will be.  Another cranny where 
the rules don't apply to come back and bite someone hard in the future.

> Display upgrade domain information in fsck
> ------------------------------------------
>
>                 Key: HDFS-9016
>                 URL: https://issues.apache.org/jira/browse/HDFS-9016
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>         Attachments: HDFS-9016.patch
>
>
> This will make it easy for people to use fsck to check block placement when 
> upgrade domain is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to