[jira] [Commented] (FLINK-8973) End-to-end test: Run general purpose job with failures in standalone mode

ASF GitHub Bot (JIRA) Fri, 23 Mar 2018 06:36:56 -0700

    [ 
https://issues.apache.org/jira/browse/FLINK-8973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16411383#comment-16411383
 ]


ASF GitHub Bot commented on FLINK-8973:
---------------------------------------

Github user fhueske commented on a diff in the pull request:

    https://github.com/apache/flink/pull/5750#discussion_r176726476
  
    --- Diff: flink-end-to-end-tests/test-scripts/common.sh ---
    @@ -39,6 +39,93 @@ cd $TEST_ROOT
     export TEST_DATA_DIR=$TEST_INFRA_DIR/temp-test-directory-$(date +%S%N)
     echo "TEST_DATA_DIR: $TEST_DATA_DIR"
     
    +function revert_default_config() {
    +    sed 's/^    //g' > ${FLINK_DIR}/conf/flink-conf.yaml << EOL
    +    
#==============================================================================
    +    # Common
    +    
#==============================================================================
    +
    +    jobmanager.rpc.address: localhost
    +    jobmanager.rpc.port: 6123
    +    jobmanager.heap.mb: 1024
    +    taskmanager.heap.mb: 1024
    +    taskmanager.numberOfTaskSlots: 1
    +    parallelism.default: 1
    +
    +    
#==============================================================================
    +    # Web Frontend
    +    
#==============================================================================
    +
    +    web.port: 8081
    +EOL
    +}
    +
    +function create_ha_conf() {
    +
    +    # create the masters file (only one currently).
    +    # This must have all the masters to be used in HA.
    +    echo "localhost:8081" > ${FLINK_DIR}/conf/masters
    +
    +    # then move on to create the flink-conf.yaml
    +
    +    if [ -e $TEST_DATA_DIR/recovery ]; then
    +       echo "File ${TEST_DATA_DIR}/recovery exists. Deleting it..."
    +       rm -rf $TEST_DATA_DIR/recovery
    +    fi
    +
    +    sed 's/^    //g' > ${FLINK_DIR}/conf/flink-conf.yaml << EOL
    +    
#==============================================================================
    +    # Common
    +    
#==============================================================================
    +
    +    jobmanager.rpc.address: localhost
    +    jobmanager.rpc.port: 6123
    +    jobmanager.heap.mb: 1024
    +    taskmanager.heap.mb: 1024
    +    taskmanager.numberOfTaskSlots: 4
    +    parallelism.default: 1
    +
    +    
#==============================================================================
    +    # High Availability
    +    
#==============================================================================
    +
    +    high-availability: zookeeper
    +    high-availability.zookeeper.storageDir: 
file://${TEST_DATA_DIR}/recovery/
    +    high-availability.zookeeper.quorum: localhost:2181
    +    high-availability.zookeeper.path.root: /flink
    +    high-availability.cluster-id: /test_cluster_one
    +
    +    
#==============================================================================
    +    # Web Frontend
    +    
#==============================================================================
    +
    +    web.port: 8081
    +EOL
    +}
    +
    +function start_ha_cluster {
    +    echo "Setting up HA Cluster..."
    +    create_ha_conf
    +    start_local_zk
    +    start_cluster
    +}
    +
    +function start_local_zk {
    +    while read server ; do
    +        server=$(echo -e "${server}" | sed -e 's/^[[:space:]]*//' -e 
's/[[:space:]]*$//') # trim
    +
    +        # match server.id=address[:port[:port]]
    +        if [[ $server =~ ^server\.([0-9]+)[[:space:]]*\=[[:space:]]*([^: 
\#]+) ]]; then
    +            id=${BASH_REMATCH[1]}
    +            address=${BASH_REMATCH[2]}
    --- End diff --
    
    `address` seems to be unused


> End-to-end test: Run general purpose job with failures in standalone mode
> -------------------------------------------------------------------------
>
>                 Key: FLINK-8973
>                 URL: https://issues.apache.org/jira/browse/FLINK-8973
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Tests
>    Affects Versions: 1.5.0
>            Reporter: Till Rohrmann
>            Assignee: Kostas Kloudas
>            Priority: Blocker
>             Fix For: 1.5.0
>
>
> We should set up an end-to-end test which runs the general purpose job 
> (FLINK-8971) in a standalone setting with HA enabled (ZooKeeper). When 
> running the job, the job failures should be activated. 
> Additionally, we should randomly kill Flink processes (cluster entrypoint and 
> TaskExecutors). When killing them, we should also spawn new processes to make 
> up for the loss.
> This end-to-end test case should run with all different state backend 
> settings: {{RocksDB}} (full/incremental, async/sync), {{FsStateBackend}} 
> (sync/async)
> We should then verify that the general purpose job is successfully recovered 
> without data loss or other failures.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-8973) End-to-end test: Run general purpose job with failures in standalone mode

Reply via email to