[jira] [Commented] (FLINK-9861) Add end-to-end test for reworked BucketingSink

ASF GitHub Bot (JIRA) Thu, 02 Aug 2018 05:36:33 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-9861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16566710#comment-16566710
 ]


ASF GitHub Bot commented on FLINK-9861:
---------------------------------------

zentol opened a new pull request #6478: [FLINK-9861][tests] Add 
StreamingFileSink E2E test 
URL: https://github.com/apache/flink/pull/6478
 
 
   ## What is the purpose of the change
   
   This PR adds an end-to-end test for the streaming file sink.
   
   The source of the test program emits 60k elements over 60 seconds, after 
which it idles until the job is canceled. Records are `Tuple2<Integer, 
Integer>`, where `f0` is a key between 0 and 9 and `f1` a sequence counter from 
0 to 59999. The current sequence count is checkpointed.
   
   The sink of the test program uses a custom encoder&bucketer, and rolls on 
each checkpoint.
   
   Test behavior:
   
   1. submit job
   2. wait for some checkpoints
   3. kill 1 TM
   4. wait for restart
   5. kill 2 TMs
   6. wait for restart
   7. wait until the sink no longer creates new files
   8. cancel job
   9. verify result
   
   Step 2 ensures that some state is actually checkpointed, as otherwise the 
source could emit the entire output again. The current checkpoint count is 
retrieved through the REST API:
   
   Step 3-6 induce failures. We specifically wait for restarts to ensure that 
the job restarts twice. The current number of restarts is retrieved through the 
metrics REST API.
   
   Step 7-8 are a work-around for the behavior of checkpointing sinks for 
finite streams. For finite streams it usually happens that any data received 
between the last checkpoint and job termination is not committed. So we don't 
have to deal with this we let the job run until the sinks are no longer 
creating new files for checkpoints. Files are only created if records are 
received, so if no file is created by any sink we can imply that none have 
received a record, which should mean that there is in fact no more data to 
process. Generally speaking this is just an assumption, but given the 
checkpoint interval of 5 seconds it is highly unlikely that no records makes it 
to the sink in time.
   
   Step 9 verifies the result by concatenating and sorting all committed files, 
which should result in the sequence [0, 60000).
   
   If any data was lost by the sink the sequence will be interrupted. If any 
data was committed twice the sequence will contain a duplicate entry.
   
   ## Verifying this change
   
   https://travis-ci.org/zentol/flink-ci/builds/411211048

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> Add end-to-end test for reworked BucketingSink
> ----------------------------------------------
>
>                 Key: FLINK-9861
>                 URL: https://issues.apache.org/jira/browse/FLINK-9861
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Streaming Connectors
>    Affects Versions: 1.6.0
>            Reporter: Till Rohrmann
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 1.6.0
>
>
> We should add a end-to-end test for the reworked BucketingSink to verify that 
> the sink works with different {{FileSystems}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-9861) Add end-to-end test for reworked BucketingSink

Reply via email to