Hadoop 1.0 Interface Classification - scope (visibility - public/private) and 
stability
---------------------------------------------------------------------------------------

                 Key: HADOOP-5073
                 URL: https://issues.apache.org/jira/browse/HADOOP-5073
             Project: Hadoop Core
          Issue Type: Sub-task
            Reporter: Sanjay Radia
            Assignee: Sanjay Radia


This jira proposes an interface classification for hadoop interfaces.
The discussion was started in email alias [email protected] in Nov 
2008.

h2. Interface Taxonomy - Scope & Stability Classification

The interface taxonomy  classification provided here is for guidance to 
developers and users of interfaces.
The classification guides a developer to declare the scope (or targeted 
audience or users) of an interface and also its stability.
* *Benefits to the user of an interface*: Knows which interfaces to use or not 
use and their stability.
* *Benefits to the developer*: to prevent accidental changes of interfaces and 
hence accidental impact on users or other components or system. This is 
particularly useful in large systems with many developers who may not all have 
a shared state/history of the project.

This classification was derived from  a taxonomy used inside Yahoo and 
from the OpenSolaris taxonomy 
(http://www.opensolaris.org/os/community/arc/policies/interface-taxonomy/#Advice)

Interface have two main attributes: *Scope* and *Stability*
* *Scope* -  _denotes  the potential customers of the interface_.
   For example many interfaces are merely internal or private interfaces of the 
implementation while others are public or external interfaces that applications 
or clients are expected to use. In posix, libc is an is an external or public 
interface, while large parts of the kernel are internal or private interfaces. 
In addition, some  interfaces are targeted to some specific other subsystems. 
Identifying the scope helps define the customers or users of the interfaces and 
helps define the impact of breaking an interface. For example we may be willing 
to break the comaptibility of an interface whose scope is a small number of 
specific subsystems. One the other hand, one is unlikely to break a protocol 
interfaces that millions of internet users depend on.
  The following are useful scopes in order of increasing/wider visibility
**  *project-private*
***  the interface is for internal use _within_ the project and should not be 
used by applications. It is subject to change at anytime without notice. Most 
interfaces of a project are project private.
**  *limited-private*
***  the interface is used by a specified set of projects or systems (typically 
closely related projects). Other projects or systems should not use the 
interface. Changes to the interface will be communicated/negotiated with the 
specified projects. For example, in the hadoop project, some interfaces are 
*hdfs-mapReduce-private* in that they are private to the hdfs and mapReduce 
projects.
**  *company-private* (*_This not applicable to opensource projects such as 
Hadoop._* It is mentioned here for completeness.)
***  the interface can use used by other projects within a company. 
**  *public* 
***  the interface is for general use by any application.

* *Stability* -  _denotes when changes can be made to the interface that break 
compatibility_.
**  *Stable*
***  Can evolve while retaining compatibility for minor release boundaries.; 
can break compatibility only at major release (ie. at m.0).
**  *Evolving*
***  Evolving, but can break compatibility at minor release (i.e.  m.x)
**  *Unstable*
***  This usually makes sense for only private interfaces. 
***  However one may call this out for a _supposedly_ public interface to 
highlight that it should not be used as an interface; for public interfaces, 
labeling it as *Not-an-interface* is probably more appropriate than "unstable".
**** Examples of publically visible interfaces that are unstable (ie 
not-an-interface): GUI, CLIs whose output format will change
**  *Deprecated* - should not be used, will be removed in the future.


h2. FAQ
# What is the harm in applications using a private interface that is stable? 
How is it different than a public stable interface?
   While a private interface marked as stable is targeted to change only at 
major releases, it may break at other times if the providers of that interface 
are willing to changes the internal users of that interface. Further, a public 
stable interface is less likely to break even at major releases (even though it 
is allowed to break compatibility) because the impact of the change is larger. 
*If you use a private interface (regardless of its stability) you run the risk 
of incompatibility*.
# Why bother declaring the stability of a private interface? 
**  To communicate the intent to its internal users.
**  To provide guidelines to developers of the interface
**  The stability may capture other internal properties of the system
***  e.g In HDFS,  NN-DN protocol stability can help implement as rolling 
upgrades
***  e.g. In HDFS, FSImage stabilty can help provide more flexible roll backs.
# How will the classification be recorded for hadoop APIs?
** Each interface or class will have the scope and stability recorded using 
javadoc tags, annotation, or some other mechanim. What ever mechanism we 
choose, the classification must be visisble on the genrated java doc.
** APIs of private scope will not be part of the "public javadoc generated by 
ant (ie by the _ant target_ "javadoc"); they will only be generated for the 
developer javadoc (generated by _ant target_ "javadoc-dev")
** One can derive the scope of java classes and java interfaces by the scope of 
the package in which they are contained. Hence it is useful to declare the 
scope of each java package as public or private (along with the private scope 
variations).


h2. Proposed Classification for Hadoop Interfaces

* Scope Public
**  Stable
***  FileSystem, MapReduce, Config, CLI (inlcuding output), parts of 
Mapred.lib, Job Logs API, instrumentation metrics. Audit logs
**  Evolving
***  TFile, parts of Mapred.lib, some instrumentation metrics, jmx interface 
(till it becomes stable), 
***  Job logs and job history ( Some tools, scripts and chukwa use this to 
analyze job processing)
**  Not An interface
***  Web GUI
* Scope Private
**  Limited-Private  Evolving
***  RPC, Metrics (HDFS-MapReduce Private) - once stable, we can consider 
making these public-stable.
**  Project-Private Stable
***  Intra-HDFS and MR protocols (facilitates rolling upgrades down the road)
***  FSImage 
**** Note this will enable old versions of HDFS to read newer fsImage and hence 
enable more flexible roll backs.
**** Q. Should this be Project-Private Evolving instead?
**** Regardless of the stability of FSImage, new versions of HDFS have to be 
able to transparently convert older versions and provide roll-back.
**  Project-Private Evolving
***  DFSClient (Q. should this be "project-private unstable"
**  Project-Private Unstable 
***  System logs
***  All implementation classes and interfaces not otherwise classified are 
considered to be project-private stable.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to