nsivabalan commented on issue #12298:
URL: https://github.com/apache/hudi/issues/12298#issuecomment-2494488775

   I am not sure I follow the use case fully. 
   
   From what I gauge, this is what the situation is:
   
   table has multi-writers enabled. diff writers could write to diff 
parittions. partitioning column is dynamically derived from input data. 
   
   So, writer1 tries to write to partitionX. we do not have any completed 
commits in active timeline containing any data for the partition of interest. 
   This is what you suggested. 
   
   but the stackrace shows different. 
   
   if we had routed it to MergeHandle, it means that we already had some data 
written to parittionX and hence the file group was chosen as small file. and 
eventually we ended up w/ HoodieMergeHandle. 
   
   But let me poke around one theory though. 
   lets say there was a concurrent writer, writer2 was writing to partition2 
and is inflight and added the file group of interest. 
   writer1 when going thru upsert partitioner was able to get hold of the same 
file group which the other writer is writing to and routes it to 
HoodieMergeHandle. 
   but in b/w, other writer (writer2) fails and rolls backs the commit which 
deleted the file group of interest. and hence within HoodieMergeHandle, we run 
into Filegroup not found issue. 
   
   but the possibility of this happening is very unlikely, since the commit 
from writer2 is not committed at all. I have to go through the code to see if 
we have any edge cases around this. 
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to