[jira] [Commented] (AVRO-923) Avro-MapRed: Provide a fallback using avro beans instead of schema in job configuration

Doug Cutting (Commented) (JIRA) Wed, 12 Oct 2011 09:45:38 -0700

    [ 
https://issues.apache.org/jira/browse/AVRO-923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13125964#comment-13125964
 ]


Doug Cutting commented on AVRO-923:
-----------------------------------

> it seems to me this risk is already taken for other parameters such as 
> "avro.mapper". For the case of schemas though there is a second check that 
> occurs when the input file schema does not match the compiled schema.

The input schema is not what I was most concerned about, rather the map output 
schema.  If different tasks somehow got a different map output schema it would 
result in strange hard-to-debug i/o exceptions.  We require that the map output 
schema is constant across all tasks in a job for things to work correctly.  Of 
course it's not always possible to prohibit folks from creating erroneous 
situations, we should try to discourage that but don't want to overly limit 
functionality in the process.

> It can also be described with xml files

What I meant was that the xml files can be programmatically constructed.  They 
should ideally not be constructed with cut and paste, but should use the same 
source for schemas as the Java code that's getting re-generated to build the 
new version of the jar file.  Perhaps you can refer to the schemas with an 
external entity definition in the XML that fetches the appropriate version? 

{code}
<!DOCTYPE job [
<!ENTITY schemaX SYSTEM "http://svn.foo.com/project/trunk/schemas/x.avsc";>
]>
<job>
 ... &schemaX; ...
</job>
{code}

                
> Avro-MapRed: Provide a fallback using avro beans instead of schema in job 
> configuration
> ---------------------------------------------------------------------------------------
>
>                 Key: AVRO-923
>                 URL: https://issues.apache.org/jira/browse/AVRO-923
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>    Affects Versions: 1.5.4
>         Environment: any
>            Reporter: Julien Muller
>             Fix For: 1.6.0
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> The current implementation of Avro MapRed is designed to use JobConf. While 
> it is possible to use job.xml file, it is pretty painful since you have to 
> copy/paste the all schemes for input and output. This is error prone and time 
> consuming. Also any update in a bean requires to recopy/repaste the schema 
> (if using JobConf a simple recompile would be enough).
> A proposition to improve this and to stay backward compatible would be to 
> introduce new keys in AvroJob and reference the actual avro bean used. This 
> can be implemented as a fallback.
> New keys would be created:
> - avro.input.schema > avro.input.class
> - avro.map.output.schema > avro.map.output.class
> - avro.output.schema > avro.output.class
> Only 3 methods would be impacted in AvroJob:
> - getInputSchema(Configuration job) {
>       // Implement a fallback like
>       String s = job.get(INPUT_SCHEMA);
>       if(s==null) s = 
> (String)Class.forName(job.get(INPUT_CLASS)).getDeclaredField("SCHEMA$").get(null);
>           return Schema.parse(s);
>       }
>   }
> - getMapOutputSchema()
> - getOutputSchema()
> Also, it would be more consistent to add new setters. This is not mandatory 
> since in that use case, the new keys are filled up directly in the job, not 
> using AvroJob. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (AVRO-923) Avro-MapRed: Provide a fallback using avro beans instead of schema in job configuration

Reply via email to