[ 
https://issues.apache.org/jira/browse/HADOOP-941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Milind Bhandarkar updated HADOOP-941:
-------------------------------------

    Description: 
Hadoop record I/O can be used effectively outside of Hadoop. It would increase 
its utility if developers can use it without having to import hadoop classes, 
or having to depend on Hadoop jars. Following changes to the current translator 
and runtime are proposed.

Proposed Changes:

1. Use java.lang.String as a native type for ustring (instead of Text.)
2. Provide a Buffer class as a native Java type for buffer (instead of 
BytesWritable), so that later BytesWritable could be implemented as following 
DDL:
module org.apache.hadoop.io {
  record BytesWritable {
    buffer value;
  }
}
3. Member names in generated classes should not have prefixes 'm' before their 
names. In the above example, the private member name would be 'value' not 
'mvalue' as it is done now.
4. Convert getters and setters to have CamelCase. e.g. in the above example the 
getter will be:
  public Buffer getValue();
5. Generate clone() methods for records in Java i.e. the generated classes 
should implement Cloneable.
6. Make generated Java codes for maps and vectors use Java generics.

These are the proposed user-visible changes. Internally, the translator will be 
restructured so that it is easier to plug-in translators for different targets.


  was:
Hadoop record I/O can be used effectively outside of Hadoop. It would increase 
its utility if developers can use it without having to import hadoop classes, 
or having to depend on Hadoop jars. Following changes to the current translator 
and runtime are proposed.

Proposed Changes:

1. Use java.lang.String as a native type for ustring (instead of Text.)
2. Provide a Buffer class as a native Java type for buffer (instead of 
BytesWritable), so that later BytesWritable could be implemented as following 
DDL:
module org.apache.hadoop.io {
  record BytesWritable {
    buffer value;
  }
}
3. Member names in generated classes should not have prefixes 'm' before their 
names. In the above example, the private member name would be 'value' not 
'mvalue' as it is done now.
4. Convert getters and setters to have CamelCase. e.g. in the above example the 
getter will be:
  public Buffer getValue();
5. Provide a 'swiggable' C binding, so that processing the generated C code 
with swig allows it to be used in scripting languages such as Python and Perl.
6. The default --language="java" target would generate class code for records 
that would not have Hadoop dependency on WritableComparable interface, but 
instead would have "implements Record, Comparable". (i.e. It will not have 
write() and readFields() methods.) An additional option "--writable" will need 
to be specified on rcc commandline to generate classes that "implements Record, 
WritableComparable".
7. Optimize generated write() and readFields() methods, so that they do not 
have to create BinaryOutputArchive or BinaryInputArchive every time these 
methods are called on a record.
8. Implement ByteInStream and ByteOutStream for C++ runtime, as they will be 
needed for using Hadoop Record I/O with forthcoming C++ MapReduce framework 
(currently, only FileStreams are provided.)
9. Generate clone() methods for records in Java i.e. the generated classes 
should implement Cloneable.
10. As part of Hadoop build process, produce a tar bundle for Record I/O alone. 
This tar bundle will contain the translator classes and ant task (lib/rcc.jar), 
translator script (bin/rcc), Java runtime (recordio.jar) that includes 
org.apache.hadoop.record.*, sources for the java runtime (src/java), and c/c++ 
runtime sources with Makefiles (src/c++, src/c).
11. Make generated Java codes for maps and vectors use Java generics.

These are the proposed user-visible changes. Internally, the translator will be 
restructured so that it is easier to plug-in translators for different targets.


        Summary: Enhancements to Hadoop record I/O - Part 1  (was: Make Hadoop 
Record I/O Easier to use outside Hadoop)

Split the issue of making record I/O usable outside Hadoop into a separate 
issue. Will pull the current patch, and upload a new patch.

> Enhancements to Hadoop record I/O - Part 1
> ------------------------------------------
>
>                 Key: HADOOP-941
>                 URL: https://issues.apache.org/jira/browse/HADOOP-941
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: record
>    Affects Versions: 0.10.1
>         Environment: All
>            Reporter: Milind Bhandarkar
>         Assigned To: Milind Bhandarkar
>
> Hadoop record I/O can be used effectively outside of Hadoop. It would 
> increase its utility if developers can use it without having to import hadoop 
> classes, or having to depend on Hadoop jars. Following changes to the current 
> translator and runtime are proposed.
> Proposed Changes:
> 1. Use java.lang.String as a native type for ustring (instead of Text.)
> 2. Provide a Buffer class as a native Java type for buffer (instead of 
> BytesWritable), so that later BytesWritable could be implemented as following 
> DDL:
> module org.apache.hadoop.io {
>   record BytesWritable {
>     buffer value;
>   }
> }
> 3. Member names in generated classes should not have prefixes 'm' before 
> their names. In the above example, the private member name would be 'value' 
> not 'mvalue' as it is done now.
> 4. Convert getters and setters to have CamelCase. e.g. in the above example 
> the getter will be:
>   public Buffer getValue();
> 5. Generate clone() methods for records in Java i.e. the generated classes 
> should implement Cloneable.
> 6. Make generated Java codes for maps and vectors use Java generics.
> These are the proposed user-visible changes. Internally, the translator will 
> be restructured so that it is easier to plug-in translators for different 
> targets.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to