galenwarren commented on pull request #15599:
URL: https://github.com/apache/flink/pull/15599#issuecomment-818921856


   Sorry for the long delay. I do have a couple of additional questions that 
came up that I wanted to ask:
   
   - Does the license NOTICE file get generated automatically during 
build/deploy, or is that something I need to generate? I saw a script called 
```collect_license_files.sh``` in the project, but I wasn't sure how to use it. 
Right now, there is no NOTICE.
   - 
[RecoverableFsDataOutputStream.Committer](https://ci.apache.org/projects/flink/flink-docs-release-1.8/api/java/org/apache/flink/core/fs/RecoverableFsDataOutputStream.Committer.html)
 contains both a ```commit``` and ```commitAfterRecovery```, and the 
descriptions say that the latter should be tolerant of situations where, say, 
the file has already been committed, which suggests that the former should not 
tolerate that situation. I've implemented if that way, but in thinking through 
some possible scenarios, it seems like it would be possible for a file to be 
written, committed (which deletes the temp files), and then for the processing 
to be restarted from an earlier check/savepoint at which point the recoverable 
write was still in progress. In that case, temporary files would continue to 
get written from that point on, but at commit time, the commit would fail 
because some of the temporary files would have already been deleted. So I 
wasn't sure if it perhaps made more sense to *a
 lways* look for the presence of the final file when committing with either 
method -- not just with ```commitAfterRecovery``` so that the commit would not 
fail in that case. The cost would be an extra file read on every commit, to see 
if the commit had already completed.
   
   @xintongsong , thanks for your help so far and looking forward to your 
feedback.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to