Github user RalphSu commented on the issue:
https://github.com/apache/flink/pull/6103
@tillrohrmann already did that, it looks alleviate though not fix. I'm
upgrade from 1.2.0 to 1.4.2. Major thing i can see is TM now connection to HDFS
instead of only talk to JobManager, could this increase the possibility of
this issue?---
