[
https://issues.apache.org/jira/browse/PIG-5108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rohini Palaniswamy reopened PIG-5108:
-------------------------------------
Reopening issue as it only fixes the problem for this tuple type. Came across
another one which extended DefaultTuple and it errored out with "Unexpected
datatype 110 while reading tuplefrom binary file"
> AvroStorage on Tez with exception on nested records
> ---------------------------------------------------
>
> Key: PIG-5108
> URL: https://issues.apache.org/jira/browse/PIG-5108
> Project: Pig
> Issue Type: Bug
> Components: tez
> Affects Versions: 0.16.0
> Environment: HadoopVersion: 2.6.0-cdh5.8.0
> PigVersion: 0.16.0
> TezVersion: 0.7.0
> Reporter: Sebastian Geller
> Assignee: Daniel Dai
> Fix For: 0.17.0, 0.16.1
>
> Attachments: person-prop.avro, PIG-5108-1.patch
>
>
> Hi,
> While migrating to the latest Pig version we have seen a general issue when
> using nested Avro records on Tez:
> {code}
> Caused by: java.io.IOException: class
> org.apache.pig.impl.util.avro.AvroTupleWrapper.write called, but not
> implemented yet
> at
> org.apache.pig.impl.util.avro.AvroTupleWrapper.write(AvroTupleWrapper.java:68)
> at
> org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:139)
> ...
> {code}
> The setup is
> schema
> {code}
> {
> "fields": [
> {
> "name": "id",
> "type": "int"
> },
> {
> "name": "property",
> "type": {
> "fields": [
> {
> "name": "id",
> "type": "int"
> }
> ],
> "name": "Property",
> "type": "record"
> }
> }
> ],
> "name": "Person",
> "namespace": "com.github.ouyi.avro",
> "type": "record"
> }
> {code}
> Pig script group_person.pig
> {code}
> loaded_person =
> LOAD '$input'
> USING AvroStorage();
> grouped_records =
> GROUP
> loaded_person BY (property.id);
> STORE grouped_records
> INTO '$output'
> USING AvroStorage();
> {code}
> sample data
> {code}
> {"id":1,"property":{"id":1}}
> {code}
> Execution on Tez
> {code}
> pig -x tez_local -p input=file:///usr/lib/pig/pig-0.16.0/person-prop.avro -p
> output=file:///output group_person.pig
> ...
> Caused by: java.io.IOException: class
> org.apache.pig.impl.util.avro.AvroTupleWrapper.write called, but not
> implemented yet
> at
> org.apache.pig.impl.util.avro.AvroTupleWrapper.write(AvroTupleWrapper.java:68)
> at
> org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:139)
> ...
> {code}
> Execution on mapred
> {code}
> pig -x local -p input=file:///usr/lib/pig/pig-0.16.0/person-prop.avro -p
> output=file:///output7 group_person.pig
> ...
> Output(s):
> Successfully stored 1 records in: "file:///output7"
> ...
> {code}
> I am going to attach the complete log files of both runs.
> I assume that the Pig script should work regardless of Tez or mapreduce? Is
> there any underlying change when migrating to Tez which makes the schema
> invalid?
> Thanks,
> Sebastian
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)