[ 
https://issues.apache.org/jira/browse/FLINK-8971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16458462#comment-16458462
 ] 

ASF GitHub Bot commented on FLINK-8971:
---------------------------------------

GitHub user tzulitai opened a pull request:

    https://github.com/apache/flink/pull/5941

    [FLINK-8971] [e2e] Include broadcast / union state in general purpose 
DataStream job

    ## What is the purpose of the change
    
    This PR extends the general purpose DataStream job to contain an operator 
that uses broadcast and union state.
    
    The operator is self-verifiable such that on restore of the job, if 
restored broadcast state / union state is incorrect, an exception will be 
thrown to fail the job.
    
    The fact that the `test_resume_savepoint` job uses the general purpose 
DataStream job ensures that we have an e2e test that covers broadcast / union 
state savepointing and resuming.
    
    ## Brief change log
    
    - 8315969 a preliminary fix that allows the sequence generator to have only 
one key
    - eb8df23 Introduces an operator that uses broadcast and union state
    
    ## Verifying this change
    
    This is an extension to existing tests.
    
    ## Does this pull request potentially affect one of the following parts:
    
      - Dependencies (does it add or upgrade a dependency): (yes / **no**)
      - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / **no**)
      - The serializers: (yes / **no** / don't know)
      - The runtime per-record code paths (performance sensitive): (yes / 
**no** / don't know)
      - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / **no** / don't know)
      - The S3 file system connector: (yes / **no** / don't know)
    
    ## Documentation
    
      - Does this pull request introduce a new feature? (yes / **no**)
      - If yes, how is the feature documented? (**not applicable** / docs / 
JavaDocs / not documented)


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tzulitai/flink FLINK-8971

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/5941.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5941
    
----
commit 83159697b4d856b109863baee9d880bd2f1b4c86
Author: Tzu-Li (Gordon) Tai <tzulitai@...>
Date:   2018-04-30T10:04:43Z

    [hotfix] [e2e-tests] Make SequenceGeneratorSource usable for 0-size key 
ranges

commit eb8df230d1963ac7ca0e819aa5eb203d8a8682bf
Author: Tzu-Li (Gordon) Tai <tzulitai@...>
Date:   2018-04-30T10:05:46Z

    [FLINK-8971] [e2e-tests] Include broadcast / union state in general purpose 
DataStream job

----


> Create general purpose testing job
> ----------------------------------
>
>                 Key: FLINK-8971
>                 URL: https://issues.apache.org/jira/browse/FLINK-8971
>             Project: Flink
>          Issue Type: Task
>          Components: Tests
>    Affects Versions: 1.5.0
>            Reporter: Till Rohrmann
>            Priority: Critical
>             Fix For: 1.5.0
>
>
> In order to write better end-to-end tests we need a general purpose testing 
> job which comprises as many Flink aspects as possible. These include 
> different types for records and state, user defined components, state types 
> and operators.
> The job should allow to activate a certain misbehavior, such as slowing 
> certain paths down or throwing exceptions to simulate failures.
> The job should come with a data generator which generates input data such 
> that the job can verify it's own behavior. This includes the state as well as 
> the input/output records.
> We already have the [heavily misbehaved 
> job|https://github.com/rmetzger/flink-testing-jobs/blob/flink-1.4/basic-jobs/src/main/java/com/dataartisans/heavymisbehaved/HeavyMisbehavedJob.java]
>  which simulates some misbehavior. There is also the [state machine 
> job|https://github.com/dataArtisans/flink-testing-jobs/tree/master/streaming-state-machine]
>  which can verify itself for invalid state changes which indicate data loss. 
> We should incorporate their characteristics into the new general purpose job.
> Additionally, the general purpose job should contain the following aspects:
> * Job containing a sliding window aggregation
> * At least one generic Kryo type
> * At least one generic Avro type
> * At least one Avro specific record type
> * At least one input type for which we register a Kryo serializer
> * At least one input type for which we provide a user defined serializer
> * At least one state type for which we provide a user defined serializer
> * At least one state type which uses the AvroSerializer
> * Include an operator with ValueState
> * Value state changes should be verified (e.g. predictable series of values)
> * Include an operator with operator state
> * Include an operator with broadcast state
> * Broadcast state changes should be verified (e.g. predictable series of 
> values)
> * Include union state
> * User defined watermark assigner
> The job should be made available in the {{flink-end-to-end-tests}} module.
> This issue is intended to serve as an umbrella issue for developing and 
> extending this job.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to