[ 
https://issues.apache.org/jira/browse/PIG-4326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Prim updated PIG-4326:
------------------------------
    Attachment: supportForMapsOfArraysOfRecords.patch

Did include your previous comment wrongly, my mistake, your attached patch 
works with the test. Anyway, I still think we should not change the existing 
behavior for maps of records.

Previously, for maps of records the AvroStorageSchemaConversionUtils did create:
{code}
map[ MyRecord: (fielda: int, ...., fieldz: int) ]
{code}
which I think is what we want as the record should be one tuple and you want to 
preserve a possible alias. Your fix removes this tuple and the schema looks 
like 
{code}
map[ fielda: int, ...., fieldz: int ]
{code}
So I uploaded a new proposal for a patch, which keeps the original behavior for 
maps of records, whereas for maps of maps and maps of arrays, it removes the 
additional nesting tuple, thus resulting in e.g.
{code}
map[ array: { MyRecord: (fielda: int, ...., fieldz: int) } ]
{code}

> AvroStorageSchemaConversionUtilities does not properly convert schema for 
> maps of arrays of records
> ---------------------------------------------------------------------------------------------------
>
>                 Key: PIG-4326
>                 URL: https://issues.apache.org/jira/browse/PIG-4326
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.12.0, 0.13.0
>            Reporter: Michael Prim
>            Assignee: Michael Prim
>             Fix For: 0.15.0
>
>         Attachments: PIG-4326-0.patch, mapsOfArraysOfRecords.patch, 
> supportForMapsOfArraysOfRecords.patch
>
>
> I tried to convert the avro schema of a map of arrays of records into the 
> proper pig schema and got always empty map schemas in pig.
> The reason is that the AvroStorageSchemaConversionUtilities does only assume 
> records or primitive types as content of the map. However, a map of arrays, 
> or a map of map, could have a schema itself and requires recursive calling to 
> derive the full schema.
> I wrote a unit test to test for maps of arrays of records which fails with 
> every pig release since the AvroStorage was rewritten (I think this was in 
> 0.12), and there have been no changes since then in the trunk. 
> Further the attached patch contains the (rather simple) fix that makes the 
> schema conversion utils succeed.
> Would appreciate further comments and if this can be included upstream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to