[jira] Commented: (HADOOP-5071) Hadoop 1.0 Compatibility Requirements

Doug Cutting (JIRA) Fri, 16 Jan 2009 14:40:25 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12664717#action_12664717
 ]


Doug Cutting commented on HADOOP-5071:
--------------------------------------

Job persistence would be a great feature to add at some point, but I don't see 
why it is essential for 1.0.  Hadoop 1.0 is a batch-oriented system, not a 
high-availability system.  We'll force clients to stop accessing HDFS while it 
is upgraded, and I think a 1.0.1 release that forces folks to empty their job 
queues before they upgrade would be acceptable.  Why is this critical to 1.0, 
and what does it have to do with compatibility?


> Hadoop 1.0 Compatibility Requirements
> -------------------------------------
>
>                 Key: HADOOP-5071
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5071
>             Project: Hadoop Core
>          Issue Type: Sub-task
>            Reporter: Sanjay Radia
>            Assignee: Sanjay Radia
>
> The purpose of this Jira is to decide on  Hadoop 1.0 Compatibility 
> requirements
> A proposal is described below that was discussed on email alias 
> [email protected]
> Release terminology used below:
> *Standard release numbering: major, minor, dot releases*
> * Only bug fixes in dot releases: m.x.y
> ** no changes to API, disk format, protocols or config etc. in a dot release
> * new features in major (m.0) and minor (m.x.0) releases
> *Hadoop Compatibility Proposal*
> - *1 API Compatibility*
> No need for client recompilation when upgrading across minor releases (ie. 
> from m.x to m.y, where x <= y)
> Classes or methods deprecated in m.x can be removed in (m+1).0
> Note that this is stronger than what we have been doing in Hadoop 0.x 
> releases.
>       This is fairly standard compatibility rules for major and minor 
> releases.
> - *2 Data Compatibility*
> -- Motivation: Users expect File systems preserve data transparently across 
> releases.
> -- 2.a HDFS metadata and data can change across minor or major releases , but 
> such changes are transparent to user application. That is release upgrade 
> must automatically convert the metadata and data as needed. Further, a 
> release upgrade must allow a cluster to roll back to the older version and 
> its older disk format. (rollback needs to restore the orignal data not any 
> updated data).
> -- 2.a-WeakerAutomaticConversion:
> Automatic conversion is support across a small number of releases. If a user 
> wants to jump across multiple releases he may be forced to go through a few 
> intermediate release to get to the final desired release.
> - *3 Wire Protocol Compatibility*
> We offer no wire compatibility in our 0.x release today.
> -- Motivation: The motivation *isn't* to make the hadoop protocols public. 
> Applications will not call the protocol directly but through a library (in 
> our case FileSystem class and its implementations). Instead the motivation is 
> that customers run multiple clusters and have apps that access data across 
> clusters. Customers cannot be expected to update all clusters simultaneously.
> -- 3.a Old m.x clients can connect to new m.y servers, where x <= y but the 
> old clients might get reduced functionality or performance. m.x clients might 
> not be able to connect to (m+1).z servers
> -- 3.b. New m.y clients must be able to connect to old m.x server, where x< y 
> but only for old m.x functionality.
> Comment: Generally old API methods continue to use old rpc methods. However, 
> it is legal to have new implementations of old API methods call new
> rpcs methods, as long as the library transparently handles the fallback case 
> for old servers.
> -- 3.c. At any major release transition [ ie from a release m.x to a release 
> (m+1).0], a user should be able to read data from the cluster running the old 
> version.
> --- Motivation: data copying across clusters is a common operation for many 
> customers. For example this is routinely at done at Yahoo; another use case 
> is HADOOP-4058. Today, http (or hftp) provides a guaranteed compatible way of 
> copying data across versions. Clearly one cannot force a customer to 
> simultaneously update all its Hadoop clusters on to a new major release.  We 
> can satisfy this requirement via the http/hftp mechanism or some other 
> mechanism.
> -- 3.c-Stronger
> Shall we add a stronger requirement for 1. 0 : wire compatibility across 
> major versions? That is not just for reading but for all operations. This can 
> be supported by class loading or other games.
> Note we can wait to provide this when 2. 0 happens. If Hadoop provided this 
> guarantee then it would allow customers to partition their data across 
> clusters without risking apps breaking across major releases due to wire 
> incompatibility issues.
> --- Motivation: Data copying is a compromise. Customers really want to run 
> apps across clusters running different versions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5071) Hadoop 1.0 Compatibility Requirements

Reply via email to