Python version is 2.7.6 On Oct 4, 2017 9:52 AM, "Driesprong, Fokko" <[email protected]> wrote:
> Hi David, > > Thank you for the question. The problem for the Spark-sql hook seems > related > <https://issues.apache.org/jira/browse/AIRFLOW-1647>, but the issue is > different. At the spark-sql hook, the problem was that there where two > iterators, one for the stdout, and one for the stderr. First the one of the > stdout was being read, and after hitting an eof, the stderr iterator would > be emptied. This caused the stderr to grow quickly (since the stdout was > being read first). This was fixed by redirecting the stderr to stdout and > use a single iterator. This is already the case in the BashOperator. > > The iterator should be ready by Airflow, so the stdout buffer should be > read. What version of Python are you using? > > Cheers, Fokko > > > > > > 2017-10-03 7:55 GMT+02:00 Bolke de Bruin <[email protected]>: > > > Probably a buffer is full or not emptied in time (as you mentioned). Ie. > > If we’re reading from stderr but the stdout is full it gets stuck. This > was > > fixed for the SparkOperators but we might need to do the same here. > > > > Bolke > > > > Verstuurd vanaf mijn iPad > > > > > Op 3 okt. 2017 om 03:02 heeft David Capwell <[email protected]> het > > volgende geschreven: > > > > > > We use the bash operator to call a Java command line. We notice that > some > > > times the task stays running a long time (never stops) and that the > logs > > in > > > airflow stop getting updated for the task. After debugging a bit it > turns > > > out that the jvm is blocked on the stdout FD since the buffer is full. > I > > > manually cleaned the buffer (just called cat to dump the buffer) and > see > > > the jvm halts cleanly but the task stays stuck in airflow; airflow run > is > > > still running but the forked process is not > > > > > > Walking the code in bash_operator I see that airflow creates a shell > > script > > > than has bash run it. I see in the logs the location of the script but > I > > > don't see it on the file system. I didn't check when the process was > hung > > > so dont know if bash was running or not. > > > > > > We have seen this a few times. Any idea what's going on? New to > debugging > > > Python and ptrace is disabled in our env so can't find a way to get the > > > state of the airflow run command. > > > > > > Thanks for any help! > > > > > > Airflow version: 1.8.0 and 1.8.2 (above was on 1.8.2 but we see this on > > > 1.8.0 cluster as well) > > >
