openinx commented on pull request #2010:
URL: https://github.com/apache/iceberg/pull/2010#issuecomment-800769586


   Okay,  after reconsidered the primary key uniqueness issues,  it's hard to 
guarantee the uniqueness in an embedded table format lib,  If both the spark 
job and flink streaming job are writing the same iceberg table I couldn't think 
of a good and efficient way to guarantee the uniqueness of primary key.  If we 
have an online server in front of those data files, then it's will be easy to 
guarantee the uniqueness because all of the write requests will be send to the 
same online server and the server could decide how to reject those duplicated 
write request, while for an iceberg table format it's hard to synchronize 
between different computation job. 
   
   So I'm fine to introduce the primary key without enforced uniqueness.  
@jackye1995   Did you start this work in your repo ? Should we update this PR 
based the above discussion ?  ( Sorry about the delay).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to