Agree that we need to keep hdfs data transient across integration test runs. I have removed the volumes in the compose file and updated the PR https://github.com/apache/incubator-hudi/pull/989 Hopefully, this should fix the flakiness. Balaji.V
On Friday, November 1, 2019, 08:26:38 AM PDT, Vinoth Chandar <[email protected]> wrote: Update on this thread.. There has been progress and we have few fixes being tested https://github.com/vinothchandar/incubator-hudi/tree/hudi-312-flaky-tests https://github.com/apache/incubator-hudi/pull/989 It boiled down the remnants from the previous run hanging around and causing invalid states. We also had some threadpool that was n't closed upon such an unexpected error causing the jvm to hang around. @Balaji Varadarajan I think its best to rebuild and publish new images which use local storage for hdfs . wdyt? Also filed a few follow ups : HUDI-322, HUDI-323 On Sat, Oct 26, 2019 at 9:36 AM Vinoth Chandar <[email protected]> wrote: Disabling UI is not doing the trick. I think it gets stuck while starting up (and not while exiting like I assumed incorrectly before). On Fri, Oct 25, 2019 at 9:00 AM Vinoth Chandar <[email protected]> wrote: Could we disable the UI and try again? Its either the jetty threads or the two HDFS threads that's hanging on. Cannot understand why the JVM would n't exit otherwise. On Fri, Oct 25, 2019 at 5:27 AM Bhavani Sudha <[email protected]> wrote: https://gist.github.com/bhasudha/5aac43d93a942f68bcab413a26229292 Took a thread dump. Seems like jetty threads are not shutting down? Dont see any hudi/spark related activity that is pending. Only threads in RUNNABLE state are jetty ones On Fri, Oct 25, 2019 at 1:54 AM Pratyaksh Sharma <[email protected]> wrote: > Hi Vinoth, > > > can you try > - Do : docker ps -a and make sure there are no lingering containers. > - if so, run : cd docker; ./stop_demo.sh > - cd .. > - mvn clean verify -DskipUTs=true -B > > I ran the above 3 times. Twice it was successful but once it incurred the > same errors I listed in previous mail. > > On Fri, Oct 25, 2019 at 8:26 AM Vinoth Chandar < > [email protected]> wrote: > > > Got the integ test to hang once, at the same spot as Pratyaksh > mentioned.. > > So it would be a good candidate to drill into. > > > > @nishith in this state, the containers are all open. So you could just > hop > > in and stack trace to see whats going on. > > > > > > On Thu, Oct 24, 2019 at 9:14 AM Nishith <[email protected]> wrote: > > > > > I’m going to look into the flaky tests on Travis sometime today. > > > > > > -Nishith > > > > > > Sent from my iPhone > > > > > > > On Oct 23, 2019, at 10:23 PM, Vinoth Chandar <[email protected]> > > wrote: > > > > > > > > Just to make sure we are on the same page, > > > > > > > > can you try > > > > - Do : docker ps -a and make sure there are no lingering containers. > > > > - if so, run : cd docker; ./stop_demo.sh > > > > - cd .. > > > > - mvn clean verify -DskipUTs=true -B > > > > > > > > and this always gets stuck? The failures on CI seem to be random > > > timeouts. > > > > Not very related to this. > > > > > > > > FWIW I ran the above 3 times, without glitches so far.. So if you can > > > > confirm then it ll help > > > > > > > >> On Wed, Oct 23, 2019 at 7:04 AM Vinoth Chandar <[email protected]> > > > wrote: > > > >> > > > >> I saw someone else share the same experience. Can't think of > anything > > > that > > > >> could have caused this to become flaky recently. > > > >> I already created https://issues.apache.org/jira/browse/HUDI-312 > > > >> < > > > > > > https://issues.apache.org/jira/browse/HUDI-312?filter=12347468&jql=project%20%3D%20HUDI%20AND%20fixVersion%20%3D%200.5.1%20AND%20(status%20%3D%20Open%20OR%20status%20%3D%20%22In%20Progress%22)%20ORDER%20BY%20assignee%20ASC > > > > > > to > > > >> look into some flakiness on travis. > > > >> > > > >> any volunteers to drive this? (I am in the middle of fleshing out an > > > RFC) > > > >> > > > >> On Wed, Oct 23, 2019 at 6:43 AM Pratyaksh Sharma < > > [email protected] > > > > > > > >> wrote: > > > >> > > > >>> It gets stuck forever while running the following - > > > >>> > > > >>> Container : /adhoc-1, Running command :spark-submit --class > > > >>> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer > > > >>> > > > > /var/hoodie/ws/docker/hoodie/hadoop/hive_base/target/hoodie-utilities.jar > > > >>> --storage-type MERGE_ON_READ --source-class > > > >>> org.apache.hudi.utilities.sources.JsonDFSSource > > > --source-ordering-field ts > > > >>> --target-base-path /user/hive/warehouse/stock_ticks_mor > > --target-table > > > >>> stock_ticks_mor --props /var/demo/config/dfs-source.properties > > > >>> --schemaprovider-class > > > >>> org.apache.hudi.utilities.schema.FilebasedSchemaProvider > > > >>> --disable-compaction --enable-hive-sync --hoodie-conf > > > >>> hoodie.datasource.hive_sync.jdbcurl=jdbc:hive2://hiveserver:10000 > > > >>> --hoodie-conf hoodie.datasource.hive_sync.username=hive > > --hoodie-conf > > > >>> hoodie.datasource.hive_sync.password=hive --hoodie-conf > > > >>> hoodie.datasource.hive_sync.partition_fields=dt --hoodie-conf > > > >>> hoodie.datasource.hive_sync.database=default --hoodie-conf > > > >>> hoodie.datasource.hive_sync.table=stock_ticks_mor > > > >>> > > > >>> On Wed, Oct 23, 2019 at 7:02 PM Pratyaksh Sharma < > > > [email protected]> > > > >>> wrote: > > > >>> > > > >>>> Hi, > > > >>>> > > > >>>> I am facing errors when trying to run integration tests using the > > > script > > > >>>> travis_run_tests.sh and also it takes a lot of time or rather gets > > > >>> stuck. > > > >>>> If I run them like normal junit tests, they work fine. > > > >>>> > > > >>>> Sometimes random transient errors also come, but these are the > most > > > >>>> frequent ones - > > > >>>> > > > >>>> [ERROR] Tests run: 3, Failures: 3, Errors: 0, Skipped: 0, Time > > > elapsed: > > > >>>> 345.207 s <<< FAILURE! - in > org.apache.hudi.integ.ITTestHoodieSanity > > > >>>> [ERROR] > > > >>>> > > > >>> > > > > > > testRunHoodieJavaAppOnSinglePartitionKeyCOWTable(org.apache.hudi.integ.ITTestHoodieSanity) > > > >>>> Time elapsed: 129.227 s <<< FAILURE! > > > >>>> java.lang.AssertionError: Expecting 100 rows to be present in the > > new > > > >>>> table expected:<100> but was:<200> > > > >>>> at > > > >>>> > > > >>> > > > > > > org.apache.hudi.integ.ITTestHoodieSanity.testRunHoodieJavaAppOnCOWTable(ITTestHoodieSanity.java:115) > > > >>>> at > > > >>>> > > > >>> > > > > > > org.apache.hudi.integ.ITTestHoodieSanity.testRunHoodieJavaAppOnSinglePartitionKeyCOWTable(ITTestHoodieSanity.java:42) > > > >>>> > > > >>>> [ERROR] > > > >>>> > > > >>> > > > > > > testRunHoodieJavaAppOnMultiPartitionKeysCOWTable(org.apache.hudi.integ.ITTestHoodieSanity) > > > >>>> Time elapsed: 108.146 s <<< FAILURE! > > > >>>> java.lang.AssertionError: Expecting 100 rows to be present in the > > new > > > >>>> table expected:<100> but was:<200> > > > >>>> at > > > >>>> > > > >>> > > > > > > org.apache.hudi.integ.ITTestHoodieSanity.testRunHoodieJavaAppOnCOWTable(ITTestHoodieSanity.java:115) > > > >>>> at > > > >>>> > > > >>> > > > > > > org.apache.hudi.integ.ITTestHoodieSanity.testRunHoodieJavaAppOnMultiPartitionKeysCOWTable(ITTestHoodieSanity.java:54) > > > >>>> > > > >>>> [ERROR] > > > >>>> > > > >>> > > > > > > testRunHoodieJavaAppOnNonPartitionedCOWTable(org.apache.hudi.integ.ITTestHoodieSanity) > > > >>>> Time elapsed: 107.63 s <<< FAILURE! > > > >>>> java.lang.AssertionError: Expecting 100 rows to be present in the > > new > > > >>>> table expected:<100> but was:<200> > > > >>>> at > > > >>>> > > > >>> > > > > > > org.apache.hudi.integ.ITTestHoodieSanity.testRunHoodieJavaAppOnCOWTable(ITTestHoodieSanity.java:115) > > > >>>> at > > > >>>> > > > >>> > > > > > > org.apache.hudi.integ.ITTestHoodieSanity.testRunHoodieJavaAppOnNonPartitionedCOWTable(ITTestHoodieSanity.java:66) > > > >>>> > > > >>>> Has anybody else faced similar issues? > > > >>>> > > > >>>> > > > >>>> > > > >>> > > > >> > > > > > >
