[ https://issues.apache.org/jira/browse/PIG-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Bill Graham updated PIG-2195: ----------------------------- Attachment: expected_testRecordSplitFromText2.avro expected_testRecordSplitFromText1.avro PIG-2195_1.patch Attached is a first patch that has: * More unit tests reading from text files. * A fix to how unit tests are run as described above. * Support for specifying a JSON {{schema_file}}. Also included are 2 new expected test result files that are needed for one of the new tests. The should live here: {{contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/storage/avro/avro_test_files}} > AvroStorage fails to STORE when LOADing via PigStorage > ------------------------------------------------------ > > Key: PIG-2195 > URL: https://issues.apache.org/jira/browse/PIG-2195 > Project: Pig > Issue Type: Bug > Reporter: Bill Graham > Assignee: Bill Graham > Attachments: PIG-2195_1.patch, > expected_testRecordSplitFromText1.avro, expected_testRecordSplitFromText2.avro > > > Reading data via {{PigStorage}} and writing it via {{AvroStorage}} fails with > an exception like this > {{java.lang.ClassCastException: org.apache.pig.data.BinSedesTuple cannot be > cast to org.apache.avro.generic.IndexedRecord}} > The Pig script in this section of the documentation shows an example like > this that fails: > http://linkedin.jira.com/wiki/display/HTOOLS/AvroStorage+-+Pig+support+for+Avro+data#AvroStorage-PigsupportforAvrodata-A.Howtostoredataindifferentways. > A workaround currently exists to produce avro from TSVs like this: > {noformat} > avro = LOAD 'inputPath/' AS (foo); > STORE avro INTO 'outputPath/' USING oap.piggybank.storage.avro.AvroStorage( > '{"data":"data_file.avro", > "same":"data_file.avro", "field0":"def:bar"}'); > {noformat} > This is redundant though and {{data}} and {{same}} seem to indicate the same > thing. This approach also requires an existing avro data file to exist. This > patch will make the following alternate constructor syntax's work as well. > # Read schema from an existing data file: > {noformat} > '{"data":"data_file.avro", "field0":"def:bar"}'); > {noformat} > # Read schema from an existing schema file: > {noformat} > '{"schema_file":"data_file.avsc", "field0":"def:bar"}'); > {noformat} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira