rdblue commented on PR #4415:
URL: https://github.com/apache/iceberg/pull/4415#issuecomment-1173192373

   I just spent some time debugging this and I think the problem is with Flink, 
not with Iceberg.
   
   When I run the case that fails, the issue is that the upserted rows are out 
of order. In both the final insert data file and the records that I see in the 
debugger, the column with `bool=true` is passed in last. Because this is an 
upsert, the last row is the version that you end up with. I'm not sure why 
Flink is passing the rows in the wrong order, but it could be that the parser 
doesn't keep track of the order of rows in a `VALUES` clause.
   
   I also did a little exploration into what is different with the "succeeds" 
case. The significant change between the two in my testing was the use of 
`TO_DATE('2022-03-01')` instead of `DATE '2022-03-01'`. `TO_DATE` appears to 
trigger the rows getting out of order.
   
   I'm going to close this issue. Feel free to reopen if you think it is a 
problem with Iceberg somehow reordering the rows, but that seems unlikely given 
that these rows are coming from Flink and are identical when they come through 
the Iceberg API (the `TO_DATE` / `DATE` distinction is internal to Flink).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to