[ https://issues.apache.org/jira/browse/GOBBLIN-571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16708634#comment-16708634 ]
Tilak Patidar commented on GOBBLIN-571: --------------------------------------- Any updates on merging this PR? > JsonIntermediateToParquetGroupConverter generates wrong parquet schema for > complex types such as enums, arrays and maps > ----------------------------------------------------------------------------------------------------------------------- > > Key: GOBBLIN-571 > URL: https://issues.apache.org/jira/browse/GOBBLIN-571 > Project: Apache Gobblin > Issue Type: Bug > Reporter: Tilak Patidar > Priority: Critical > Fix For: 0.14.0 > > > For complex types such as arrays, maps and enums > JsonIntermediateToParquetGroupConverter is generating wrong schema. For > enums, arrays and maps the OPTIONAL and REQUIRED attribute of the SchemaField > is messed up. > > Due to this spark throws the following errors when reading parquet files > generated using JsonIntermediateToParquetGroupConverter > {code:java} > Caused by: parquet.io.ParquetDecodingException: Can not read value at 0 {code} > Ex of a wrong schema generated is below. Notice the field payload.action is > marked as required > {code:java} > message EventData { > optional int64 id; > optional binary type (UTF8); > required group actor { > optional int64 id; > optional binary login (UTF8); > optional binary gravatar_id (UTF8); > optional binary url (UTF8); > optional binary avatar_url (UTF8); > } > required group repo { > optional int64 id; > optional binary name (UTF8); > optional binary url (UTF8); > optional binary urlid (UTF8); > } > required group payload { > optional int64 id; > optional binary ref (UTF8); > optional binary ref_type (UTF8); > optional binary master_branch (UTF8); > optional binary description (UTF8); > optional binary pusher_type (UTF8); > optional binary before (UTF8); > required binary action (UTF8); > } > optional boolean public; > optional binary created_at (UTF8); > optional binary created_at_id (UTF8); > } > {code} > But the field payload.action which is defined in the source.schema property > is set to isNullable: true > {code:java} > [ .... > { > "columnName": "payload", > "dataType": { > "type": "record", > "name": "payloadDetails", > "values": [ > .... > { > "columnName": "action", > "isNullable": true, > "dataType": { > "type": "enum", > "name": "actionType", > "symbols": [ > "started", > "published", > "opened", > "closed", > "created", > "reopened", > "added" > ] > } > } > ] > } > }.... > ] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)