[jira] Commented: (HADOOP-941) Make Hadoop Record I/O Easier to use outside Hadoop

Milind Bhandarkar (JIRA) Fri, 26 Jan 2007 14:59:10 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467949
 ]


Milind Bhandarkar commented on HADOOP-941:
------------------------------------------

Comments Inline:

>I don't understand the motivation for this. The Writable and 
>WritableComparable >interfaces are small and standalone. Having record-defined 
>classes that cannot >be easily used with the rest of Hadoop seems like it 
>could cause more confusion >than including these.

The use of Writable in the context of Record I/O is for binary serialization 
format only. However, the record I/O supports serialization to two additional 
formats CSV and XML. Since there is no way to incorporate these into Writable, 
one needs an additional record-i/o defined interface, which is Record. If 
Record can handle all three serialization formats anyway, Writable support is 
needed only if one wants to use SequenceFile or IPC in Hadoop. Outside of 
Hadoop, these are not useful.

>Are you proposing a separate release cycle or just separate release artifacts? 
>>If a separate release cycle, then this should be placed in a separate 
>>sub-project. Each sub-project requires a diverse developer community, which 
>I'm >not sure that record alone has.

I am not proposing any of these. Single Hadoop build should have a single 
artifact, which is the Hadoop-core.jar. I am proposing that there should be a 
way (i.e. ant target) to produce a stand-alone record I/O Jar that does not 
include anything from outside org.apache.hadoop.record.*. This target is not 
invoked by default at build time.


>I suspect that many users of records might find SequenceFile also useful.

A particular usage of Record I/O that I am considering is as a wire protocol 
for records. Having a common wire protocol for multiple language targets is 
obviously facilitated by Record I/O.

I do not believe the problem lies with the size of Hadoop Jar file, but with a 
perception among Record I/O users that somehow their application has a 
dependency on Hadoop.



> Make Hadoop Record I/O Easier to use outside Hadoop
> ---------------------------------------------------
>
>                 Key: HADOOP-941
>                 URL: https://issues.apache.org/jira/browse/HADOOP-941
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: record
>    Affects Versions: 0.10.1
>         Environment: All
>            Reporter: Milind Bhandarkar
>         Assigned To: Milind Bhandarkar
>             Fix For: 0.11.0
>
>
> Hadoop record I/O can be used effectively outside of Hadoop. It would 
> increase its utility if developers can use it without having to import hadoop 
> classes, or having to depend on Hadoop jars. Following changes to the current 
> translator and runtime are proposed.
> Proposed Changes:
> 1. Use java.lang.String as a native type for ustring (instead of Text.)
> 2. Provide a Buffer class as a native Java type for buffer (instead of 
> BytesWritable), so that later BytesWritable could be implemented as following 
> DDL:
> module org.apache.hadoop.io {
>   record BytesWritable {
>     buffer value;
>   }
> }
> 3. Member names in generated classes should not have prefixes 'm' before 
> their names. In the above example, the private member name would be 'value' 
> not 'mvalue' as it is done now.
> 4. Convert getters and setters to have CamelCase. e.g. in the above example 
> the getter will be:
>   public Buffer getValue();
> 5. Provide a 'swiggable' C binding, so that processing the generated C code 
> with swig allows it to be used in scripting languages such as Python and Perl.
> 6. The default --language="java" target would generate class code for records 
> that would not have Hadoop dependency on WritableComparable interface, but 
> instead would have "implements Record, Comparable". (i.e. It will not have 
> write() and readFields() methods.) An additional option "--writable" will 
> need to be specified on rcc commandline to generate classes that "implements 
> Record, WritableComparable".
> 7. Optimize generated write() and readFields() methods, so that they do not 
> have to create BinaryOutputArchive or BinaryInputArchive every time these 
> methods are called on a record.
> 8. Implement ByteInStream and ByteOutStream for C++ runtime, as they will be 
> needed for using Hadoop Record I/O with forthcoming C++ MapReduce framework 
> (currently, only FileStreams are provided.)
> 9. Generate clone() methods for records in Java i.e. the generated classes 
> should implement Cloneable.
> 10. As part of Hadoop build process, produce a tar bundle for Record I/O 
> alone. This tar bundle will contain the translator classes and ant task 
> (lib/rcc.jar), translator script (bin/rcc), Java runtime (recordio.jar) that 
> includes org.apache.hadoop.record.*, sources for the java runtime (src/java), 
> and c/c++ runtime sources with Makefiles (src/c++, src/c).
> 11. Make generated Java codes for maps and vectors use Java generics.
> These are the proposed user-visible changes. Internally, the translator will 
> be restructured so that it is easier to plug-in translators for different 
> targets.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-941) Make Hadoop Record I/O Easier to use outside Hadoop

Reply via email to