[ 
https://issues.apache.org/jira/browse/AVRO-1356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Paulsen updated AVRO-1356:
-------------------------------

    Description: 
AvroMultipleOutputs sets the MapOutputKeySchema when running a map only job, as 
follows:


{code:java}
boolean isMaponly = job.getNumReduceTasks() == 0;
    if (keySchema != null) {
      if (isMaponly)
        AvroJob.setMapOutputKeySchema(job, keySchema);
      else
        AvroJob.setOutputKeySchema(job, keySchema);
    }
    if (valSchema != null) {
      if (isMaponly)
        AvroJob.setMapOutputValueSchema(job, valSchema);
      else
        AvroJob.setOutputValueSchema(job, valSchema);
    }
{code}

Unfortunately, AvroKeyOutputFormat and AvroKeyValueOutputFormat never check if 
the job is map only, and uses the OutputKeySchema and OutputValueSchema 
regardless.

We can fix this by either 
* Changing AvroKeyOutputFormat and AvroKeyValueOutputFormat to check if the job 
is map only and use the appropriate schema.  (Seems right)
* Change AvroMultipleOutputs to always use the OutputKeySchema and 
OutputValueSchema 



  was:
AvroMultipleOutputs sets the MapOutputKeySchema when running a map only job, as 
follows:

{code:java}
boolean isMaponly = job.getNumReduceTasks() == 0;
    if (keySchema != null) {
      if (isMaponly)
        AvroJob.setMapOutputKeySchema(job, keySchema);
      else
        AvroJob.setOutputKeySchema(job, keySchema);
    }
    if (valSchema != null) {
      if (isMaponly)
        AvroJob.setMapOutputValueSchema(job, valSchema);
      else
        AvroJob.setOutputValueSchema(job, valSchema);
    }
{code}

Unfortunately, AvroKeyOutputFormat and AvroKeyValueOutputFormat never check if 
the job is map only, and uses the OutputKeySchema and OutputValueSchema 
regardless.

We can fix this by either 
* Changing AvroKeyOutputFormat and AvroKeyValueOutputFormat to check if the job 
is map only and use the appropriate schema.  (Seems right)
* Change AvroMultipleOutputs to always use the OutputKeySchema and 
OutputValueSchema 



    
> AvroMultipleOutputs map only jobs do not use NamedOutput schemas
> ----------------------------------------------------------------
>
>                 Key: AVRO-1356
>                 URL: https://issues.apache.org/jira/browse/AVRO-1356
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.7.4
>            Reporter: Alan Paulsen
>             Fix For: 1.7.5
>
>
> AvroMultipleOutputs sets the MapOutputKeySchema when running a map only job, 
> as follows:
> {code:java}
> boolean isMaponly = job.getNumReduceTasks() == 0;
>     if (keySchema != null) {
>       if (isMaponly)
>         AvroJob.setMapOutputKeySchema(job, keySchema);
>       else
>         AvroJob.setOutputKeySchema(job, keySchema);
>     }
>     if (valSchema != null) {
>       if (isMaponly)
>         AvroJob.setMapOutputValueSchema(job, valSchema);
>       else
>         AvroJob.setOutputValueSchema(job, valSchema);
>     }
> {code}
> Unfortunately, AvroKeyOutputFormat and AvroKeyValueOutputFormat never check 
> if the job is map only, and uses the OutputKeySchema and OutputValueSchema 
> regardless.
> We can fix this by either 
> * Changing AvroKeyOutputFormat and AvroKeyValueOutputFormat to check if the 
> job is map only and use the appropriate schema.  (Seems right)
> * Change AvroMultipleOutputs to always use the OutputKeySchema and 
> OutputValueSchema 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to