[ https://issues.apache.org/jira/browse/PIG-2537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13214092#comment-13214092 ]
Daniel Dai commented on PIG-2537: --------------------------------- In the case schema is given, we shall certainly read data according to schema. Here what we shall read: ((null, null, null), b_value, c_value). > Output from flatten with a null tuple input generating data inconsistent with > the schema > ---------------------------------------------------------------------------------------- > > Key: PIG-2537 > URL: https://issues.apache.org/jira/browse/PIG-2537 > Project: Pig > Issue Type: Bug > Components: impl > Affects Versions: 0.8.0, 0.9.0 > Reporter: Xuefu Zhang > Assignee: Alan Gates > > For the following pig script, > grunt> A = load 'file' as ( a : tuple( x, y, z ), b, c ); > grunt> B = foreach A generate flatten( $0 ), b, c; > grunt> describe B; > B: {a::x: bytearray,a::y: bytearray,a::z: bytearray,b: bytearray,c: bytearray} > Alias B has a clear schema. > However, on the backend, for a row if $0 happens to be null, then output > tuple become something like > (null, b_value, c_value), which is obviously inconsistent with the schema. > The behaviour is confirmed by pig code inspection. > This inconsistency corrupts data because of position shifts. Expected output > row should be something like > (null, null, null, b_value, c_value). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira