[ 
https://issues.apache.org/jira/browse/AIRFLOW-1494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16118000#comment-16118000
 ] 

Xianping lin edited comment on AIRFLOW-1494 at 8/8/17 7:34 AM:
---------------------------------------------------------------

*job start and see 'load pickle':*     [2017-08-07 20:55:11,854] {cli.py:391} 
INFO - Loading pickle id 148
==
long time running...
==
[2017-08-07 23:24:38,906] {base_task_runner.py:95} INFO - Subtask: [2017-08-07 
23:24:38,906] {bash_operator.py:94} INFO - 17/08/07 23:24:38 INFO yarn.Client: 
Application report for application_1498090868254_911911 (state: RUNN     ING)
*then Second 'load pickle' : *[2017-08-07 23:24:38,947] {cli.py:391} INFO - 
Loading pickle id 148 . *In this time, the log is reporting job running* [what 
happened here?]


was (Author: xianping):
*job start and see 'load pickle':*     [2017-08-07 20:55:11,854] {cli.py:391} 
INFO - Loading pickle id 148
==
long time running...
==
[2017-08-07 23:24:38,906] {base_task_runner.py:95} INFO - Subtask: [2017-08-07 
23:24:38,906] {bash_operator.py:94} INFO - 17/08/07 23:24:38 INFO yarn.Client: 
Application report for application_1498090868254_911911 (state: RUNN     ING)
*then Second 'load pickle' *: [2017-08-07 23:24:38,947] {cli.py:391} INFO - 
Loading pickle id 148 . *In this time, the log is reporting job running* [what 
happened here?]

> backfill job failed because new retry comes out when a valid job is running
> ---------------------------------------------------------------------------
>
>                 Key: AIRFLOW-1494
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1494
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: backfill
>    Affects Versions: Airflow 1.8
>         Environment: ubuntu
>            Reporter: Xianping lin
>            Priority: Blocker
>
> I have a spark job, wrapped in a BASH command to run.
> From fail log, i found airflow try to rerun the job while the job is running. 
> Then a series strange things happened. My job finally failed. Logs Below:
> [2017-08-07 23:24:35,903] {base_task_runner.py:95} INFO - Subtask: 
> [2017-08-07 23:24:35,903] {bash_operator.py:94} INFO - 17/08/07 23:24:35 INFO 
> yarn.Client: Application report for application_1498090868254_911911 (state: 
> RUNNING)
> [2017-08-07 23:24:36,904] {base_task_runner.py:95} INFO - Subtask: 
> [2017-08-07 23:24:36,904] {bash_operator.py:94} INFO - 17/08/07 23:24:36 INFO 
> yarn.Client: Application report for application_1498090868254_911911 (state: 
> RUNNING)
> [2017-08-07 23:24:37,905] {base_task_runner.py:95} INFO - Subtask: 
> [2017-08-07 23:24:37,904] {bash_operator.py:94} INFO - 17/08/07 23:24:37 INFO 
> yarn.Client: Application report for application_1498090868254_911911 (state: 
> RUNNING)
> [2017-08-07 23:24:38,906] {base_task_runner.py:95} INFO - Subtask: 
> [2017-08-07 23:24:38,906] {bash_operator.py:94} INFO - 17/08/07 23:24:38 INFO 
> yarn.Client: Application report for application_1498090868254_911911 (state: 
> RUNNING)
> [2017-08-07 23:24:38,947] {cli.py:391} INFO - Loading pickle id 148
> [2017-08-07 23:24:39,020] {base_task_runner.py:112} INFO - Running: ['bash', 
> '-c', u'airflow run generated_daily submit_operator 2017-08-05T00:00:00 
> --pickle 148 --job_id 15372 --raw']
> [2017-08-07 23:24:39,426] {base_task_runner.py:95} INFO - Subtask: 
> [2017-08-07 23:24:39,426] {__init__.py:57} INFO - Using executor 
> CeleryExecutor
> [2017-08-07 23:24:39,495] {base_task_runner.py:95} INFO - Subtask: 
> [2017-08-07 23:24:39,494] {driver.py:120} INFO - Generating grammar tables 
> from /usr/lib/python2.7/lib2to3/Grammar.txt
> [2017-08-07 23:24:39,519] {base_task_runner.py:95} INFO - Subtask: 
> [2017-08-07 23:24:39,519] {driver.py:120} INFO - Generating grammar tables 
> from /usr/lib/python2.7/lib2to3/PatternGrammar.txt
> [2017-08-07 23:24:39,710] {base_task_runner.py:95} INFO - Subtask: 
> [2017-08-07 23:24:39,710] {cli.py:391} INFO - Loading pickle id 148
> [2017-08-07 23:24:39,772] {base_task_runner.py:95} INFO - Subtask: 
> [2017-08-07 23:24:39,772] {models.py:1120} INFO - Dependencies not met for 
> <TaskInstance: generated_daily.submit_operator 2017-08-05 00:00:00 
> [running]>, dependency 'Task Instance Not Already Running' FAILED: Task is 
> already running, it started on 2017-08-07 20:55:12.727910.
> [2017-08-07 23:24:39,777] {base_task_runner.py:95} INFO - Subtask: 
> [2017-08-07 23:24:39,777] {models.py:1120} INFO - Dependencies not met for 
> <TaskInstance: generated_daily.submit_operator 2017-08-05 00:00:00 
> [running]>, dependency 'Task Instance State' FAILED: Task is in the 'running' 
> state which is not a valid state for execution. The task must be cleared in 
> order to be run.
> [2017-08-07 23:24:39,907] {base_task_runner.py:95} INFO - Subtask: 
> [2017-08-07 23:24:39,907] {bash_operator.py:94} INFO - 17/08/07 23:24:39 INFO 
> yarn.Client: Application report for application_1498090868254_911911 (state: 
> RUNNING)
> [2017-08-07 23:24:40,908] {base_task_runner.py:95} INFO - Subtask: 
> [2017-08-07 23:24:40,908] {bash_operator.py:94} INFO - 17/08/07 23:24:40 INFO 
> yarn.Client: Application report for application_1498090868254_911911 (state: 
> RUNNING)
> [2017-08-07 23:24:41,909] {base_task_runner.py:95} INFO - Subtask: 
> [2017-08-07 23:24:41,909] {bash_operator.py:94} INFO - 17/08/07 23:24:41 INFO 
> yarn.Client: Application report for application_1498090868254_911911 (state: 
> RUNNING)
> [2017-08-07 23:24:42,911] {base_task_runner.py:95} INFO - Subtask: 
> [2017-08-07 23:24:42,910] {bash_operator.py:94} INFO - 17/08/07 23:24:42 INFO 
> yarn.Client: Application report for application_1498090868254_911911 (state: 
> RUNNING)
> [2017-08-07 23:24:43,912] {base_task_runner.py:95} INFO - Subtask: 
> [2017-08-07 23:24:43,912] {bash_operator.py:94} INFO - 17/08/07 23:24:43 INFO 
> yarn.Client: Application report for application_1498090868254_911911 (state: 
> RUNNING)
> [2017-08-07 23:24:44,120] {jobs.py:2148} WARNING - Recorded pid 116446 is not 
> a descendant of the current pid 33416
> [2017-08-07 23:24:44,914] {base_task_runner.py:95} INFO - Subtask: 
> [2017-08-07 23:24:44,913] {bash_operator.py:94} INFO - 17/08/07 23:24:44 INFO 
> yarn.Client: Application report for application_1498090868254_911911 (state: 
> RUNNING)
> [2017-08-07 23:24:45,914] {base_task_runner.py:95} INFO - Subtask: 
> [2017-08-07 23:24:45,914] {bash_operator.py:94} INFO - 17/08/07 23:24:45 INFO 
> yarn.Client: Application report for application_1498090868254_911911 (state: 
> RUNNING)
> [2017-08-07 23:24:46,032] {jobs.py:2155} WARNING - State of this instance has 
> been externally set to up_for_retry. Taking the poison pill. So long.
> [2017-08-07 23:24:46,081] {helpers.py:220} WARNING - Terminating descendant 
> processes of ['/usr/bin/python /usr/local/bin/airflow run generated_daily 
> submit_operator 2017-08-05T00:00:00 --pickle 148 --job_id 15015 --raw'] PID: 
> 116446
> [2017-08-07 23:24:46,081] {helpers.py:224} WARNING - Terminating descendant 
> process ['bash', '/tmp/airflowtmpCFeIBn/submit_operatorGbO01R'] PID: 116496
> [2017-08-07 23:24:46,087] {helpers.py:224} WARNING - Terminating descendant 
> process ['/bin/sh', '/mnt/data/firework-0.2.0/bin/spark-submit', 
> '--spark-version', '1.6.2', '--master', 'yarn-cluster', '--num-executors', 
> '128', '--class', 'com.hulu.reco.metrics.middlelayer.BatchApplication', 
> '--conf', 'spark.date=20170805', '--conf', 'spark.yarn.maxAppAttempts=1', 
> '--conf', 'spark.driver.memory=8g', '--conf', 'spark.executor.memory=36g', 
> '--conf', 'spark.executor.cores=8', '--conf', 
> 'spark.user.javahome.enabled=true', '--conf', 
> 'spark.user.javahome.path=/usr/lib/jvm/hulu-oracle-jdk8', '--conf', 
> 'spark.output=hdfs://warehousestore/user/pcdm/generated_prod/20170805', 
> '--queue', 'spark', 
> '/home/deploy/middlelayer-1.0-SNAPSHOT-jar-with-dependencies.jar'] PID: 116498
> [2017-08-07 23:24:46,091] {helpers.py:224} WARNING - Terminating descendant 
> process ['/usr/java/jdk8/jdk1.8.0_92-1//bin/java', '-cp', 
> '/mnt/data/firework-0.2.0/.firework_cache/spark/els/1.6.2/b72e04b6120cd0a431aae3195db97706/conf/:/mnt/data/firework-0.2.0/.firework_cache/spark/els/1.6.2/b72e04b6120cd0a431aae3195db97706/assembly/target/scala-2.10/spark-assembly-1.6.2-hadoop2.6.0-cdh5.7.3-201612201803.jar:/mnt/data/firework-0.2.0/.firework_cache/spark/els/1.6.2/b72e04b6120cd0a431aae3195db97706/lib_managed/jars/datanucleus-rdbms-3.2.9.jar:/mnt/data/firework-0.2.0/.firework_cache/spark/els/1.6.2/b72e04b6120cd0a431aae3195db97706/lib_managed/jars/datanucleus-core-3.2.10.jar:/mnt/data/firework-0.2.0/.firework_cache/spark/els/1.6.2/b72e04b6120cd0a431aae3195db97706/lib_managed/jars/datanucleus-api-jdo-3.2.6.jar:/mnt/data/firework-0.2.0/.firework_cache/spark/els/1.6.2/b72e04b6120cd0a431aae3195db97706/hadoop-conf/',
>  'org.apache.spark.deploy.SparkSubmit', '--master', 'yarn-cluster', '--conf', 
> 'spark.executor.memory=36g', '--conf', 'spark.driver.memory=8g', '--conf', 
> 'spark.user.javahome.path=/usr/lib/jvm/hulu-oracle-jdk8', '--conf', 
> 'spark.output=hdfs://warehousestore/user/pcdm/generated_prod/20170805', 
> '--conf', 'spark.yarn.maxAppAttempts=1', '--conf', 'spark.executor.cores=8', 
> '--conf', 'spark.date=20170805', '--conf', 
> 'spark.user.javahome.enabled=true', '--class', 
> 'com.hulu.reco.metrics.middlelayer.BatchApplication', '--num-executors', 
> '128', '--queue', 'spark', 
> '/home/deploy/middlelayer-1.0-SNAPSHOT-jar-with-dependencies.jar'] PID: 116761
> [2017-08-07 23:24:46,096] {helpers.py:231} WARNING - Waiting up to 5s for 
> processes to exit...
> [2017-08-07 23:24:46,099] {base_task_runner.py:95} INFO - Subtask: 
> [2017-08-07 23:24:46,099] {bash_operator.py:94} INFO - 17/08/07 23:24:46 INFO 
> util.ShutdownHookManager: Shutdown hook called
> [2017-08-07 23:24:46,100] {base_task_runner.py:95} INFO - Subtask: 
> [2017-08-07 23:24:46,100] {bash_operator.py:94} INFO - 17/08/07 23:24:46 INFO 
> util.ShutdownHookManager: Deleting directory 
> /mnt/data/tmp/spark-b5f3c795-7ea2-4ef1-a47c-2507b0747580
> [2017-08-07 23:24:46,121] {base_task_runner.py:95} INFO - Subtask: 
> [2017-08-07 23:24:46,118] {bash_operator.py:97} INFO - Command exited with 
> return code -15
> [2017-08-07 23:24:46,121] {base_task_runner.py:95} INFO - Subtask: 
> [2017-08-07 23:24:46,118] {models.py:1417} ERROR - Bash command failed
> [2017-08-07 23:24:46,121] {base_task_runner.py:95} INFO - Subtask: Traceback 
> (most recent call last):
> [2017-08-07 23:24:46,121] {base_task_runner.py:95} INFO - Subtask:   File 
> "/usr/local/lib/python2.7/dist-packages/airflow/models.py", line 1374, in run
> [2017-08-07 23:24:46,121] {base_task_runner.py:95} INFO - Subtask:     result 
> = task_copy.execute(context=context)
> [2017-08-07 23:24:46,121] {base_task_runner.py:95} INFO - Subtask:   File 
> "/usr/local/lib/python2.7/dist-packages/airflow/operators/bash_operator.py", 
> line 100, in execute
> [2017-08-07 23:24:46,121] {base_task_runner.py:95} INFO - Subtask:     raise 
> AirflowException("Bash command failed")
> [2017-08-07 23:24:46,121] {base_task_runner.py:95} INFO - Subtask: 
> AirflowException: Bash command failed
> [2017-08-07 23:24:46,121] {base_task_runner.py:95} INFO - Subtask: 
> [2017-08-07 23:24:46,120] {models.py:1433} INFO - Marking task as UP_FOR_RETRY
> [2017-08-07 23:24:46,123] {helpers.py:234} WARNING - Done waiting
> [2017-08-07 23:24:46,131] {base_task_runner.py:95} INFO - Subtask: 
> [2017-08-07 23:24:46,130] {models.py:1462} ERROR - Bash command failed
> [2017-08-07 23:24:46,131] {base_task_runner.py:95} INFO - Subtask: Traceback 
> (most recent call last):
> [2017-08-07 23:24:46,131] {base_task_runner.py:95} INFO - Subtask:   File 
> "/usr/local/bin/airflow", line 28, in <module>
> [2017-08-07 23:24:46,131] {base_task_runner.py:95} INFO - Subtask:     
> args.func(args)
> [2017-08-07 23:24:46,131] {base_task_runner.py:95} INFO - Subtask:   File 
> "/usr/local/lib/python2.7/dist-packages/airflow/bin/cli.py", line 422, in run
> [2017-08-07 23:24:46,132] {base_task_runner.py:95} INFO - Subtask:     
> pool=args.pool,
> [2017-08-07 23:24:46,132] {base_task_runner.py:95} INFO - Subtask:   File 
> "/usr/local/lib/python2.7/dist-packages/airflow/utils/db.py", line 53, in 
> wrapper
> [2017-08-07 23:24:46,132] {base_task_runner.py:95} INFO - Subtask:     result 
> = func(*args, **kwargs)
> [2017-08-07 23:24:46,132] {base_task_runner.py:95} INFO - Subtask:   File 
> "/usr/local/lib/python2.7/dist-packages/airflow/models.py", line 1374, in run
> [2017-08-07 23:24:46,132] {base_task_runner.py:95} INFO - Subtask:     result 
> = task_copy.execute(context=context)
> [2017-08-07 23:24:46,133] {base_task_runner.py:95} INFO - Subtask:   File 
> "/usr/local/lib/python2.7/dist-packages/airflow/operators/bash_operator.py", 
> line 100, in execute
> [2017-08-07 23:24:46,133] {base_task_runner.py:95} INFO - Subtask:     raise 
> AirflowException("Bash command failed")
> [2017-08-07 23:24:46,133] {base_task_runner.py:95} INFO - Subtask: 
> airflow.exceptions.AirflowException: Bash command failed
> [2017-08-07 23:24:51,034] {jobs.py:2083} INFO - Task exited with return code 1
> [2017-08-07 23:33:27,872] {cli.py:391} INFO - Loading pickle id 148
> [2017-08-07 23:33:27,940] {base_task_runner.py:112} INFO - Running: ['bash', 
> '-c', u'airflow run generated_daily submit_operator 2017-08-05T00:00:00 
> --pickle 148 --job_id 15376 --raw']
> [2017-08-07 23:33:28,302] {base_task_runner.py:95} INFO - Subtask: 
> [2017-08-07 23:33:28,301] {__init__.py:57} INFO - Using executor 
> CeleryExecutor
> [2017-08-07 23:33:28,359] {base_task_runner.py:95} INFO - Subtask: 
> [2017-08-07 23:33:28,358] {driver.py:120} INFO - Generating grammar tables 
> from /usr/lib/python2.7/lib2to3/Grammar.txt
> [2017-08-07 23:33:28,378] {base_task_runner.py:95} INFO - Subtask: 
> [2017-08-07 23:33:28,378] {driver.py:120} INFO - Generating grammar tables 
> from /usr/lib/python2.7/lib2to3/PatternGrammar.txt
> [2017-08-07 23:33:28,543] {base_task_runner.py:95} INFO - Subtask: 
> [2017-08-07 23:33:28,543] {cli.py:391} INFO - Loading pickle id 148
> [2017-08-07 23:33:28,601] {base_task_runner.py:95} INFO - Subtask: 
> [2017-08-07 23:33:28,601] {models.py:1120} INFO - Dependencies not met for 
> <TaskInstance: generated_daily.submit_operator 2017-08-05 00:00:00 
> [up_for_retry]>, dependency 'Not In Retry Period' FAILED: Task is not ready 
> for retry yet but will be retried automatically. Current date is 
> 2017-08-07T23:33:28.601339 and task will be retried at 
> 2017-08-08T00:24:46.119758.
> [2017-08-07 23:33:32,954] {jobs.py:2083} INFO - Task exited with return code 0



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to