nsivabalan commented on issue #12298: URL: https://github.com/apache/hudi/issues/12298#issuecomment-2494488775
I am not sure I follow the use case fully. From what I gauge, this is what the situation is: table has multi-writers enabled. diff writers could write to diff parittions. partitioning column is dynamically derived from input data. So, writer1 tries to write to partitionX. we do not have any completed commits in active timeline containing any data for the partition of interest. This is what you suggested. but the stackrace shows different. if we had routed it to MergeHandle, it means that we already had some data written to parittionX and hence the file group was chosen as small file. and eventually we ended up w/ HoodieMergeHandle. But let me poke around one theory though. lets say there was a concurrent writer, writer2 was writing to partition2 and is inflight and added the file group of interest. writer1 when going thru upsert partitioner was able to get hold of the same file group which the other writer is writing to and routes it to HoodieMergeHandle. but in b/w, other writer (writer2) fails and rolls backs the commit which deleted the file group of interest. and hence within HoodieMergeHandle, we run into Filegroup not found issue. but the possibility of this happening is very unlikely, since the commit from writer2 is not committed at all. I have to go through the code to see if we have any edge cases around this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
