This highlights the race condition caused by using single docker registry on a machine. If 2 tests create "jenkins-docker-apache.bintray.io/beam/python" one after another then the 2nd one will replace the 1st one and cause flakyness.
Is their a way to dynamically create and destroy docker repository on a machine and clean all the relevant data? On Thu, Jun 27, 2019 at 3:15 PM Yifan Zou <yifan...@google.com> wrote: > The problem was because of the large quantity of stale docker images > generated by the Python portable tests and HDFS IT. > > Dumping the docker disk usage gives me: > > TYPE TOTAL ACTIVE SIZE > RECLAIMABLE > *Images 1039 356 424GB > 384.2GB (90%)* > Containers 987 2 2.042GB > 2.041GB (99%) > Local Volumes 126 0 392.8MB > 392.8MB (100%) > > REPOSITORY > TAG IMAGE ID CREATED > SIZE SHARED SIZE UNIQUE SIZE CONTAINERS > jenkins-docker-apache.bintray.io/beam/python3 > latest ff1b949f4442 22 hours ago 1.639GB > 922.3MB 716.9MB 0 > jenkins-docker-apache.bintray.io/beam/python > latest 1dda7b9d9748 22 hours ago 1.624GB > 913.7MB 710.3MB 0 > <none> > <none> 05458187a0e3 22 hours ago > 732.9MB 625.1MB 107.8MB 4 > <none> > <none> 896f35dd685f 23 hours ago > 1.639GB 922.3MB 716.9MB 0 > <none> > <none> db4d24ca9f2b 23 hours ago > 1.624GB 913.7MB 710.3MB 0 > <none> > <none> 547df4d71c31 23 hours ago > 732.9MB 625.1MB 107.8MB 4 > <none> > <none> dd7d9582c3e0 23 hours ago > 1.639GB 922.3MB 716.9MB 0 > <none> > <none> 664aae255239 23 hours ago > 1.624GB 913.7MB 710.3MB 0 > <none> > <none> b528fedf9228 23 hours ago > 732.9MB 625.1MB 107.8MB 4 > <none> > <none> 8e996f22435e 25 hours ago > 1.624GB 913.7MB 710.3MB 0 > hdfs_it-jenkins-beam_postcommit_python_verify_pr-818_test latest > 24b73b3fec06 25 hours ago 1.305GB 965.7MB > 339.5MB 0 > <none> > <none> 096325fb48de 25 hours ago > 732.9MB 625.1MB 107.8MB 2 > jenkins-docker-apache.bintray.io/beam/java > latest c36d8ff2945d 25 hours ago 685.6MB > 625.1MB 60.52MB 0 > <none> > <none> 11c86ebe025f 26 hours ago > 1.639GB 922.3MB 716.9MB 0 > <none> > <none> 2ecd69c89ec1 26 hours ago > 1.624GB 913.7MB 710.3MB 0 > hdfs_it-jenkins-beam_postcommit_python_verify-8590_test latest > 3d1d589d44fe 2 days ago 1.305GB > 965.7MB 339.5MB 0 > hdfs_it-jenkins-beam_postcommit_python_verify_pr-801_test latest > d1cc503ebe8e 2 days ago 1.305GB > 965.7MB 339.2MB 0 > hdfs_it-jenkins-beam_postcommit_python_verify-8577_test latest > 8582c6ca6e15 3 days ago 1.305GB > 965.7MB 339.2MB 0 > hdfs_it-jenkins-beam_postcommit_python_verify-8576_test latest > 4591e0948170 3 days ago 1.305GB > 965.7MB 339.2MB 0 > hdfs_it-jenkins-beam_postcommit_python_verify-8575_test latest > ab181c49d56e 4 days ago 1.305GB > 965.7MB 339.2MB 0 > hdfs_it-jenkins-beam_postcommit_python_verify-8573_test latest > 2104ba0a6db7 4 days ago 1.305GB > 965.7MB 339.2MB 0 > ... > <1000+ images> > > I removed unused the images and the beam15 is back now. > > Opened https://issues.apache.org/jira/browse/BEAM-7650. > Ankur, I assigned the issue to you. Feel free to reassign it if needed. > > Thank you. > Yifan > > On Thu, Jun 27, 2019 at 11:29 AM Yifan Zou <yifan...@google.com> wrote: > >> Something were eating the disk. Disconnected the worker so jobs could be >> allocated to other nodes. Will look deeper. >> Filesystem Size Used Avail Use% Mounted on >> /dev/sda1 485G 485G 96K 100% / >> >> >> On Thu, Jun 27, 2019 at 10:54 AM Yifan Zou <yifan...@google.com> wrote: >> >>> I'm on it. >>> >>> On Thu, Jun 27, 2019 at 10:17 AM Udi Meiri <eh...@google.com> wrote: >>> >>>> Opened a bug here: https://issues.apache.org/jira/browse/BEAM-7648 >>>> >>>> Can someone investigate what's going on? >>>> >>>