nsivabalan commented on issue #3394: URL: https://github.com/apache/hudi/issues/3394#issuecomment-894482014
1. sorry, looks like we missed to update our config page. "hoodie.simple.index.update.partition.path" is the one for simple index. 2. Let me try to illustrate w/ simple example. Format: record key, partition path, col1, preCombine insert: rec1, pp1, v1, pc1 rec2, pp2, v1, pc1 both records will be inserted into hudi table. data in hudi table rec1, pp1, v1, pc1 rec2, pp2, v1, pc1 Now, lets see what happens if some overlapping records are ingested with hoodie.simple.index.update.partition.path = false. records will always be routed to old partition if found in hudi table. new writes: rec1, pp2, v2, pc2 rec3, pp2, v2, pc2 Once committed, this is what data in hudi table looks like rec1, pp1, v2, pc2 // new partition path ignored. rec2, pp2, v1, pc1 rec3, pp2, v2, pc2 Now, let's see what happens if some overlapping records are ingested with hoodie.simple.index.update.partition.path = true. records will always be routed to old partition if found in hudi table. data in hudi table rec1, pp1, v1, pc1 rec2, pp2, v1, pc1 new writes: rec1, pp2, v2, pc2 rec3, pp2, v2, pc2 Once committed, this is what data in hudi table looks like rec1, pp2, v2, pc2 // new partition path honored. rec1, pp1, v1, pc1 : deleted. rec2, pp2, v1, pc1 rec3, pp2, v2, pc2 Bottom line with global type index, is record keys are unique across entire data set (irrespective of partitionpath) Let me know if this is clear. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
