[
https://issues.apache.org/jira/browse/CRUNCH-129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628972#comment-13628972
]
Gabriel Reid commented on CRUNCH-129:
-------------------------------------
[~joshwills] are these both (i.e. the title and the description of the issue)
both talking about the same thing? It seems like the ClassCastException in the
description is more of a planner (?) issue, whereas the caching of the
iterables for multiple children is more of an execution issue.
Or is the ClassCastException just covering up the real iterable issue that
would come up if the code could get to the point of actually using the iterable?
> Cache the Iterable values for each key when a groupByKey op has multiple
> children
> ---------------------------------------------------------------------------------
>
> Key: CRUNCH-129
> URL: https://issues.apache.org/jira/browse/CRUNCH-129
> Project: Crunch
> Issue Type: Bug
> Reporter: Jonathan Natkins
>
> Given a simple Avro pipeline like this:
> PGroupedTable<String, MyAvroObject> processedData = data.parallelDo(new
> DoFn<String, Pair<String, MyAvroObject>>() {
> public void process(String line, Emitter<Pair<String, MyAvroObject>>
> emitter) {
> String key = getKey(line);
> MyAvroObject value = convertToAvroObject(line);
> emitter.emit(Pair.of(key, value));
> }
> }, Avros.tableOf(Avros.strings(), Avros.specifics(MyAvroObject.class)))
> .groupByKey(3);
> PTable<MyAvroGroup, Pair<String, Iterable<MyAvroObject>>> groupedData =
> processedData.by(new MapFn<Pair<String, Iterable<MyAvroObject>>,
> MyAvroGroup>() {
> @Override
> public MyAvroGroup map(Pair<String, Iterable<MyAvroObject>>
> input) {
> MyAvroGroup group = new MyAvroGroup();
> group.objects = Lists.<MyAvroObject>newArrayList();
>
> for (MyAvroObject obj : input.second()) {
> group.objects.add(obj);
> }
>
> return group;
> }
> },
> Avros.specifics(MyAvroGroup.class));
> An exception is thrown when the by() code is run:
> 12/12/10 14:11:07 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> Exception in thread "main" java.lang.ClassCastException:
> org.apache.crunch.types.avro.AvroGroupedTableType cannot be cast to
> org.apache.crunch.types.avro.AvroType
> at org.apache.crunch.types.avro.Avros.tableOf(Avros.java:608)
> at
> org.apache.crunch.types.avro.AvroTypeFamily.tableOf(AvroTypeFamily.java:135)
> at
> org.apache.crunch.impl.mem.collect.MemCollection.by(MemCollection.java:222)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira