[jira] Commented: (HADOOP-5071) Hadoop 1.0 Compatibility Requirements

Doug Cutting (JIRA) Fri, 16 Jan 2009 11:03:21 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12664618#action_12664618
 ]


Doug Cutting commented on HADOOP-5071:
--------------------------------------

Some comments:
 - Please keep proposed solutions out of the problem description.  The solution 
should be developed in the comments.  If you have a proposed solution in mind 
when you file the issue, please add it as the first comment rather than include 
it in the description.
 - Should we split this into separate issues for HDFS, Core and Mapreduce?  The 
HDFS and Mapreduce issues can depend on the Core issue.  As we're planning to 
split the project, we should avoid detailed issues that span the three 
sub-projects.  Spanning issues should mostly just be a collection of 
per-project issues, with the details in those, no?
 - For HDFS metadata we could be clearer about when this is permitted.  I think 
it's supported between minor releases (x.y.* and x.y-1 ) and also between x.0 
and x-1.n,  where n is the current x-1 minor release when x.0 is first 
released, or somesuch.
 - as for 3c, let's start with the weaker version for now.  If we decide to 
make major releases more frequently than every couple of years then we might 
then consider the stronger version.


> Hadoop 1.0 Compatibility Requirements
> -------------------------------------
>
>                 Key: HADOOP-5071
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5071
>             Project: Hadoop Core
>          Issue Type: Sub-task
>            Reporter: Sanjay Radia
>            Assignee: Sanjay Radia
>
> The purpose of this Jira is to decide on  Hadoop 1.0 Compatibility 
> requirements
> A proposal is described below that was discussed on email alias 
> [email protected]
> Release terminology used below:
> *Standard release numbering: major, minor, dot releases*
> * Only bug fixes in dot releases: m.x.y
> ** no changes to API, disk format, protocols or config etc. in a dot release
> * new features in major (m.0) and minor (m.x.0) releases
> *Hadoop Compatibility Proposal*
> - *1 API Compatibility*
> No need for client recompilation when upgrading across minor releases (ie. 
> from m.x to m.y, where x <= y)
> Classes or methods deprecated in m.x can be removed in (m+1).0
> Note that this is stronger than what we have been doing in Hadoop 0.x 
> releases.
>       This is fairly standard compatibility rules for major and minor 
> releases.
> - *2 Data Compatibility*
> -- Motivation: Users expect File systems preserve data transparently across 
> releases.
> -- 2.a HDFS metadata and data can change across minor or major releases , but 
> such changes are transparent to user application. That is release upgrade 
> must automatically convert the metadata and data as needed. Further, a 
> release upgrade must allow a cluster to roll back to the older version and 
> its older disk format. (rollback needs to restore the orignal data not any 
> updated data).
> -- 2.a-WeakerAutomaticConversion:
> Automatic conversion is support across a small number of releases. If a user 
> wants to jump across multiple releases he may be forced to go through a few 
> intermediate release to get to the final desired release.
> - *3 Wire Protocol Compatibility*
> We offer no wire compatibility in our 0.x release today.
> -- Motivation: The motivation *isn't* to make the hadoop protocols public. 
> Applications will not call the protocol directly but through a library (in 
> our case FileSystem class and its implementations). Instead the motivation is 
> that customers run multiple clusters and have apps that access data across 
> clusters. Customers cannot be expected to update all clusters simultaneously.
> -- 3.a Old m.x clients can connect to new m.y servers, where x <= y but the 
> old clients might get reduced functionality or performance. m.x clients might 
> not be able to connect to (m+1).z servers
> -- 3.b. New m.y clients must be able to connect to old m.x server, where x< y 
> but only for old m.x functionality.
> Comment: Generally old API methods continue to use old rpc methods. However, 
> it is legal to have new implementations of old API methods call new
> rpcs methods, as long as the library transparently handles the fallback case 
> for old servers.
> -- 3.c. At any major release transition [ ie from a release m.x to a release 
> (m+1).0], a user should be able to read data from the cluster running the old 
> version.
> --- Motivation: data copying across clusters is a common operation for many 
> customers. For example this is routinely at done at Yahoo; another use case 
> is HADOOP-4058. Today, http (or hftp) provides a guaranteed compatible way of 
> copying data across versions. Clearly one cannot force a customer to 
> simultaneously update all its Hadoop clusters on to a new major release.  We 
> can satisfy this requirement via the http/hftp mechanism or some other 
> mechanism.
> -- 3.c-Stronger
> Shall we add a stronger requirement for 1. 0 : wire compatibility across 
> major versions? That is not just for reading but for all operations. This can 
> be supported by class loading or other games.
> Note we can wait to provide this when 2. 0 happens. If Hadoop provided this 
> guarantee then it would allow customers to partition their data across 
> clusters without risking apps breaking across major releases due to wire 
> incompatibility issues.
> --- Motivation: Data copying is a compromise. Customers really want to run 
> apps across clusters running different versions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5071) Hadoop 1.0 Compatibility Requirements

Reply via email to