[Hadoop Wiki] Update of "Compatibility" by ArpitAgarwal

Apache Wiki Tue, 29 Jul 2014 14:25:33 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.


The "Compatibility" page has been changed by ArpitAgarwal:
https://wiki.apache.org/hadoop/Compatibility?action=diff&rev1=5&rev2=6

  #format wiki
  #language en
  
- = This page is out of date. Please refer to 
http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/Compatibility.html
 henceforth. Thanks =
+ '''Contents moved to 
[[http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/Compatibility.html|http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/Compatibility.html]]'''
  
- == Apache Hadoop Compability ==
- 
- The goal of this page is to describe the issues that affect compatibility 
between Hadoop releases for Hadoop developers, downstream projects and end 
users.
- 
- Here are some existing relevant jiras and pages related to the topic
-  1. Describe the annotations an interface should have as per our existing 
interface classification scheme (see 
[[https://issues.apache.org/jira/browse/HADOOP-5073|HADOOP-5073]])
-  2. Cover compatibility items that are beyond the scope of API 
classification,  along the lines of those discussed in 
[[https://issues.apache.org/jira/browse/HADOOP-5071|HADOOP-5071]], focused on 
Hadoop v1.
-  3. The [[Roadmap]] captures release policies, some of the content is out of 
date.
- 
- ''Note to downstream projects/users'': If you are concerned about 
compatibility at any level, we strongly encourage you follow the Hadoop 
developer mailing lists, and track on JIRA issues that may concern you. You are 
also strongly advised to verify that your code works against beta releases of 
forthcoming Hadoop versions, as that is a time in which identified regressions 
can be corrected rapidly - if you only test when a new final release ships, the 
time to fix is likely to be at least three months.
- 
- === Compatibility types ===
- This section describes the various types of compatibility.
- ==== Java API  ====
- Hadoop interfaces and classes are annotated to describe the intended audience 
and stability in order to maintain compatibility with previous releases. See 
HADOOP-5073 for more details.
-  * InterfaceAudience: captures the intended audience, possible values are 
Public (for outside users), LimitedPrivate (for other Hadoop components, and 
closely related projects like HBase), Private (for within component use)
-  * InterfaceStability: describes what types of interface changes are 
expected. Possible values are Stable, Evolving, Unstable, and Deprecated. See 
HADOOP-5073 for details.
- ===== Usecases =====
-  * Public-Stable API compatibility is required to ensure end-user programs 
and downstream projects continue to work without any changes.
-  * LimitedPrivate-Stable API compatibility is required to allow upgrade of 
individual components across minor releases.
-  * Private-Stable API compatibility is required for rolling upgrades.
- 
- ==== Semantics compatibility ====
- Apache Hadoop strives to ensure that the behavior of APIs remains consistent 
over versions, though changes for correctness may result in changes in 
behavior. That is: if you relied on something which we consider to be a bug, it 
may get fixed.
- 
- We are in the process of specifying some APIs more rigorously, enhancing
- our test suites to verify compliance with the specification, effectively
- creating a formal specification for the subset of behaviors that can be
- easily tested. We welcome involvement in this process, from both users and
- implementors of our APIs.
- 
- ==== Wire compatibility ====
- Wire compatibility concerns the data being transmitted over the wire between 
components. Hadoop uses protocol buffers for most RPC communication. Preserving 
compatibility requires prohibiting modification to the required fields of the 
corresponding protocol buffer. Optional fields may be added without breaking 
backwards compatibility. The protocols can be categorized as follows:
-  * Client-Server: communication between Hadoop clients and servers (e.g. the 
HDFS client to NameNode protocol, or the YARN client to ResourceManager 
protocol).
-  * Client-Server (Admin): It’s worth distinguishing a subset of the 
Client-Server protocols used solely by administrative commands (eg the HA admin 
protocol) as these protocols may be changed with less impact than general 
Client-Server protocols.
-  * Server-Server: communication between servers (e.g. the protocol between 
the DataNode and NameNode, or NodeManager and ResourceManager)
- 
- Non-RPC communication should be considered as well, for example using HTTP to 
transfer an HDFS image as part of snapshotting or transferring MapTask output.
- 
- ==== Metrics/ JMX ====
- While the Metrics API compatibility is governed by Java API compatibility, 
the actual metrics exposed by Hadoop need to be compatible for users to be able 
to automate using them (scripts etc.). Adding additional metrics is compatible; 
modifying (eg changing the unit or measurement) or removing existing metrics 
breaks compatibility. Likewise, changes to JMX MBean object names also break 
compatibility.
- 
- ==== REST APIs ====
- REST API compatibility corresponds to both the request (URLs) and responses 
to each request (content, which may contain other URLs). Hadoop REST APIs are 
specifically meant for stable use by clients. The following are the exposed 
REST APIs:
-  * WebHDFS (as supported by HttpFs) - Stable 
-  * WebHDFS (as supported by HDFS) - Stable
-  * NodeManager
-  * ResourceManager
-  * MR JobHistoryServer
-  * Servlets - JMX, conf
- 
- ==== CLI Commands ====
- Users and admins use Command Line Interface commands either directly or via 
scripts to access/modify data and run jobs/apps. Changing the path of a 
command, removing or renaming command line options, the order of arguments, or 
the command return code and output may break compatibility and adversely affect 
users.
- 
- ==== Directory Structure ====
- Userlogs, job history and output are stored on disk - local or on HDFS. 
Changing the directory structure of these user-accessible files break 
compatibility, even in cases where the original path is preserved via symbolic 
links (if, for example, the path is accessed by a servlet that is configured to 
not follow links).
- 
- ==== Classpath ====
- User applications (e.g. Java programs which are not MR jobs) built against 
Hadoop might add all Hadoop jars (including Hadoop’s dependencies) to the 
application’s classpath. Adding new dependencies or updating the version of 
existing dependencies may break user programs.
- 
- ==== Environment Variables ====
- Users and related projects often utilize the exported environment variables 
(eg HADOOP_CONF_DIR), therefore removing or renaming environment variables is 
an incompatible change. 
- 
- ==== Hadoop Configuration Files ====
- Modification to Hadoop configuration properties, both key names and units of 
values. We assume users, who use Hadoop configuration objects to pass 
information to jobs, ensure their properties do not conflict with the 
key-prefixes defined by Hadoop. The following key-prefixes are used by Hadoop 
daemons and should be avoided:
-  * hadoop.*
-  * io.*
-  * ipc.*
-  * fs.*
-  * net.*
-  * file.*
-  * ftp.*
-  * s3.*
-  * kfs.*
-  * ha.*
-  * file.*
-  * dfs.*
-  * mapred.*
-  * mapreduce.*
-  * yarn.*
- 
- ==== Data Formats ====
- Hadoop uses particular formats to store data and metadata. Modifying these 
formats can interfere with rolling upgrades and hence require compatibility 
guarantees. For instance, modifying the IFile format will require re-execution 
of jobs in-flight during a rolling upgrade. Preserving certain formats like 
HDFS meta data allow access/modification of data across releases.
-

[Hadoop Wiki] Update of "Compatibility" by ArpitAgarwal

Reply via email to