Re: StreamingFileSink Bug? Committing results on stream close

Aljoscha Krettek Thu, 08 Nov 2018 04:51:32 -0800

Hi Addison,

unfortunately, there is a long-standing problem that user functions cannot 
differentiate between successful and erroneous shutdown [1]. I had this high on 
my private list of things that I finally want to see fixed in Flink 1.8. And 
your message further confirms this.


Best,
Aljoscha

[1] https://issues.apache.org/jira/browse/FLINK-2646

> On 8. Nov 2018, at 13:39, Till Rohrmann <trohrm...@apache.org> wrote:
> 
> Hi Addison,
> 
> thanks for reporting this issue. I've pulled in Kostas who worked on the
> StreamingFileSink and knows the current behaviour as well as its
> limitations best.
> 
> Cheers,
> Till
> 
> On Wed, Nov 7, 2018 at 11:49 PM Addison Higham <addis...@gmail.com> wrote:
> 
>> Hi all,
>> 
>> Just run into a bit of a problem and I am not sure what the behavior should
>> be and if this should be considered a bug? Or if there is some other way
>> this should be handled?
>> 
>> I have a streaming job with a stream that eventually closes, this job sinks
>> to a StreamingFileSink.
>> The problem I am experiencing is that any data written to the sink between
>> the last checkpoint and the close of the stream is list.
>> 
>> This happens (AFAICT) because the StreamingFileSink relies on checkpoints
>> to commit files and closing the stream currently does not try and commit
>> anything.
>> 
>> It seems like just making close call
>> `buckets.commitUpToCheckpoint(Long.MAX_VALUE)` would work pretty well
>> assuming it is a an actual stream close, but could be problematic in the
>> events of a savepoint/cancel and resuming later (it may only mean some
>> files would be prematurely committed). Ideally, we would be able to
>> differentiate between the two different types of close (an actual stream
>> finishing vs a cancel), but at the moment that doesn't seem supported.
>> 
>> If this considered a bug, please let me know and I will file a Jira, if
>> not, what is the "correct" way to handle getting all the data out with any
>> sinks that rely on a checkpoint to commit data?
>> 
>> Thanks
>>

Re: StreamingFileSink Bug? Committing results on stream close

Reply via email to