Github user gyfora commented on the pull request:

    https://github.com/apache/flink/pull/1305#issuecomment-153470424
  
    After some initial discussion we @StephanEwen we came to the following 
conclusions:
    
    The current timestamp based approach has some limitations in terms of the 
assumptions it makes about the clocks on the checkpoint coordinator which can 
cause some issues in case of master failure. It is currently assumed that the 
recovery timestamp assigned by the checkpoint coordinator is larger than the 
last checkpoint timestamp assigned before failure. If the clock difference is 
large enough this might be a problem, although fairly unlikely.
    
    The alternative approach would be to store the id, key, value triplets in 
the database (drop the timestamps). Snapshots written between two snapshots 
would be written with an incremented checkpoint id (and multiple spills to the 
same key would update the value not insert a new one every time). Cleanup could 
then be performed by getting the next checkpoint id from the coordinator (which 
would get it from zookeper). The drawback in this case would be an increased 
complexity for the batch inserts as we need to be able to handle primary key 
conflicts efficiently. While this is easy on some databases  (in MySql) in 
other cases (Derby) can be problematic.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to