BasPH edited a comment on issue #5159: [AIRFLOW-XXX] Attempt to remove 
flakeyness of LocalExecutor
URL: https://github.com/apache/airflow/pull/5159#issuecomment-485977458
 
 
   I have very limited time this week but tried digging into it. At least I 
managed to reproduce locally:
   
   I set an optional breakpoint before `self.assertEqual(len(executor.running), 
0)`:
   ```python
   if len(executor.running) != 0:
       import ipdb
       ipdb.set_trace()
   self.assertEqual(len(executor.running), 0)
   ```
   
   Let this run a few times and after just a few attempts I hit the breakpoint:
   ```bash
   for n in {1..100}; do nosetests 
tests/executors/test_local_executor.py:LocalExecutorTest.test_execution_limited_parallelism
 -s; done
   ```
   
   For the record, at this point:
   
   - `len(executor.running)` is 1 at this point in time and `executor.running` 
contains `{'fail': True}`
   - `executor.queue.empty()` returns `True`
   - `executor.result_queue.get()` returns `('fail', 'failed')` (I would expect 
this thing to be empty because `sync()` is called which loops over the 
result_queue, clearing out all items...)
   
   Thinking out loud: my understanding is there's a race condition where the 
asserts are already being called before `executor.end()` finishes. I imagine 
there's something iffy about multiple workers and passing around the 
JoinableQueue to all workers, but unsure exactly what's causing the issue. 
Maybe `change_state()` might not finish fast enough, and the result_queue is 
already empty by then, and as a result the `end()` function is finished?
   
   Hope this helps so far :-)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to