[
https://issues.apache.org/jira/browse/AVRO-581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doug Cutting updated AVRO-581:
------------------------------
Attachment: AVRO-581.patch
weather-sorted.avro
weather.avro
> It would be nice to have a unit test for map-only jobs
Good idea. I added one that also tests the default mapper over weather data.
> For a map-only job I get a NPE if I don't call AvroJob.setOutputSchema() -
> calling AvroJob.setMapOutputSchema() was not sufficient.
The javadoc for setMapOutputSchema says you should only call it if it differs
from the final output schema and that it must always be a Pair schema. Is that
not clear enough?
> A test using Specific types would be good
Good idea. I added one that sorts weather data.
> Should configureAvroJob() only set the input format if it hasn't already been
> set?
Good idea. Done.
> Generating a Pair schema could be made more elegant for users by exposing
> Pair#getPairSchema() as a static method.
Good idea. Done.
Note that, in addition to applying the patch, the two attached .avro binary
files must to be placed in share/test/data.
> java: add reducer that separates keys and values when map output is pairs
> -------------------------------------------------------------------------
>
> Key: AVRO-581
> URL: https://issues.apache.org/jira/browse/AVRO-581
> Project: Avro
> Issue Type: New Feature
> Components: java
> Reporter: Doug Cutting
> Assignee: Doug Cutting
> Fix For: 1.4.0
>
> Attachments: AVRO-581.patch, AVRO-581.patch, AVRO-581.patch,
> AVRO-581.patch, weather-sorted.avro, weather.avro
>
>
> We should add a Pair<K,V> class, implementing SpecificRecord, that combines
> instances of two schemas (specific or generic). Pairs would be compared by
> key, ignoring value. The template for its schema would be:
> {code}
> {"type": "record", "name": "org.apache.avro.mapred.Pair", "fields":[
> {"name": "key", "type":" <<insert key schema here>>},
> {"name": "value", "order": "ignore", "type": <<insert value schema>>}
> ]}
> {code}
> When map outputs are instances of this class, a reducer may be used whose
> reduce method is something like:
> public abstract void reduce(K key, Iterable<V> values);
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.