[ 
https://issues.apache.org/jira/browse/FLINK-8973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16415283#comment-16415283
 ] 

ASF GitHub Bot commented on FLINK-8973:
---------------------------------------

Github user twalthr commented on a diff in the pull request:

    https://github.com/apache/flink/pull/5750#discussion_r177338397
  
    --- Diff: flink-end-to-end-tests/test-scripts/common.sh ---
    @@ -39,6 +39,109 @@ cd $TEST_ROOT
     export TEST_DATA_DIR=$TEST_INFRA_DIR/temp-test-directory-$(date +%S%N)
     echo "TEST_DATA_DIR: $TEST_DATA_DIR"
     
    +function revert_default_config() {
    +
    +    # first revert the conf/masters file
    +    echo "localhost:8081" > ${FLINK_DIR}/conf/masters
    +
    +    # and then the conf/flink-conf.yaml
    +    sed 's/^    //g' > ${FLINK_DIR}/conf/flink-conf.yaml << EOL
    +    
#==============================================================================
    +    # Common
    +    
#==============================================================================
    +
    +    jobmanager.rpc.address: localhost
    +    jobmanager.rpc.port: 6123
    +    jobmanager.heap.mb: 1024
    +    taskmanager.heap.mb: 1024
    +    taskmanager.numberOfTaskSlots: 1
    +    parallelism.default: 1
    +
    +    
#==============================================================================
    +    # Web Frontend
    +    
#==============================================================================
    +
    +    web.port: 8081
    +EOL
    +}
    +
    +function create_ha_config() {
    +
    +    # create the masters file (only one currently).
    +    # This must have all the masters to be used in HA.
    +    echo "localhost:8081" > ${FLINK_DIR}/conf/masters
    +
    +    # clean up the dir that will be used for zookeeper storage
    +    # (see high-availability.zookeeper.storageDir below)
    +    if [ -e $TEST_DATA_DIR/recovery ]; then
    +       echo "File ${TEST_DATA_DIR}/recovery exists. Deleting it..."
    +       rm -rf $TEST_DATA_DIR/recovery
    +    fi
    +
    +    # then move on to create the flink-conf.yaml
    +    sed 's/^    //g' > ${FLINK_DIR}/conf/flink-conf.yaml << EOL
    +    
#==============================================================================
    +    # Common
    +    
#==============================================================================
    +
    +    jobmanager.rpc.address: localhost
    +    jobmanager.rpc.port: 6123
    +    jobmanager.heap.mb: 1024
    +    taskmanager.heap.mb: 1024
    +    taskmanager.numberOfTaskSlots: 4
    +    parallelism.default: 1
    +
    +    
#==============================================================================
    +    # High Availability
    +    
#==============================================================================
    +
    +    high-availability: zookeeper
    +    high-availability.zookeeper.storageDir: 
file://${TEST_DATA_DIR}/recovery/
    +    high-availability.zookeeper.quorum: localhost:2181
    +    high-availability.zookeeper.path.root: /flink
    +    high-availability.cluster-id: /test_cluster_one
    +
    +    
#==============================================================================
    +    # Web Frontend
    +    
#==============================================================================
    +
    +    web.port: 8081
    +EOL
    +}
    +
    +function start_ha_cluster {
    +    echo "Setting up HA Cluster..."
    +    create_ha_config
    +    start_local_zk
    +    start_cluster
    +}
    +
    +function start_local_zk {
    +    # Parses the zoo.cfg and starts locally zk.
    +
    +    # This is almost the same code as the
    +    # /bin/start-zookeeper-quorum.sh without the SSH part and only running 
for localhost.
    --- End diff --
    
    Shouldn't the 'end-to-end tests' test 'end-to-end' also to test our 
scripts? This way we do not test what a user would use. 


> End-to-end test: Run general purpose job with failures in standalone mode
> -------------------------------------------------------------------------
>
>                 Key: FLINK-8973
>                 URL: https://issues.apache.org/jira/browse/FLINK-8973
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Tests
>    Affects Versions: 1.5.0
>            Reporter: Till Rohrmann
>            Assignee: Kostas Kloudas
>            Priority: Blocker
>             Fix For: 1.5.0
>
>
> We should set up an end-to-end test which runs the general purpose job 
> (FLINK-8971) in a standalone setting with HA enabled (ZooKeeper). When 
> running the job, the job failures should be activated. 
> Additionally, we should randomly kill Flink processes (cluster entrypoint and 
> TaskExecutors). When killing them, we should also spawn new processes to make 
> up for the loss.
> This end-to-end test case should run with all different state backend 
> settings: {{RocksDB}} (full/incremental, async/sync), {{FsStateBackend}} 
> (sync/async)
> We should then verify that the general purpose job is successfully recovered 
> without data loss or other failures.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to