[ 
https://issues.apache.org/jira/browse/AVRO-923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13125213#comment-13125213
 ] 

Doug Cutting commented on AVRO-923:
-----------------------------------

It's slightly riskier to get the schema from the runtime than from the job, in 
particular the map output schema.  If different versions of code are somehow 
run on different nodes, then different map output schemas could be used, which 
would create havoc, since the schema does not travel with the map output data.  
When the schema is in the job.xml, there's very little chance of a lack of 
coordination, since the framework distributes the same job.xml to every task.  
If the schema comes from the runtime, there's some chance that different 
versions of classes could be installed on different nodes.

Another concern is that not all schemas have a class that defines them.  For 
example, one might have jobs whose inputs or outputs are "bytes" or "string" or 
Pair<"string","bytes">, etc.

These are the reasons that schema-in-job.xml is the required and preferred 
means of specification.  However there may be cases where it's preferable to 
additionally support specification of schemas via a specific class, as 
suggested in this issue.

A JobConf can be programmatically constructed.  Why is it so painful to insert 
the schema there as a part of your job creation/submission pipeline?  I'd like 
to better understand why that's so difficult before we add a new mechanism, 
since any added mechanism has the potential to create bugs and user confusion.
                
> Avro-MapRed: Provide a fallback using avro beans instead of schema in job 
> configuration
> ---------------------------------------------------------------------------------------
>
>                 Key: AVRO-923
>                 URL: https://issues.apache.org/jira/browse/AVRO-923
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>    Affects Versions: 1.5.4
>         Environment: any
>            Reporter: Julien Muller
>             Fix For: 1.6.0
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> The current implementation of Avro MapRed is designed to use JobConf. While 
> it is possible to use job.xml file, it is pretty painful since you have to 
> copy/paste the all schemes for input and output. This is error prone and time 
> consuming. Also any update in a bean requires to recopy/repaste the schema 
> (if using JobConf a simple recompile would be enough).
> A proposition to improve this and to stay backward compatible would be to 
> introduce new keys in AvroJob and reference the actual avro bean used. This 
> can be implemented as a fallback.
> New keys would be created:
> - avro.input.schema > avro.input.class
> - avro.map.output.schema > avro.map.output.class
> - avro.output.schema > avro.output.class
> Only 3 methods would be impacted in AvroJob:
> - getInputSchema(Configuration job) {
>       // Implement a fallback like
>       String s = job.get(INPUT_SCHEMA);
>       if(s==null) s = 
> (String)Class.forName(job.get(INPUT_CLASS)).getDeclaredField("SCHEMA$").get(null);
>           return Schema.parse(s);
>       }
>   }
> - getMapOutputSchema()
> - getOutputSchema()
> Also, it would be more consistent to add new setters. This is not mandatory 
> since in that use case, the new keys are filled up directly in the job, not 
> using AvroJob. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to