Able to run the tests successfully now. Thank you for digging into the issue.
On Fri, Nov 1, 2019 at 11:04 PM [email protected] <[email protected]> wrote: > Agree that we need to keep hdfs data transient across integration test > runs. I have removed the volumes in the compose file and updated the PR > https://github.com/apache/incubator-hudi/pull/989 > Hopefully, this should fix the flakiness. > Balaji.V > > On Friday, November 1, 2019, 08:26:38 AM PDT, Vinoth Chandar < > [email protected]> wrote: > > Update on this thread.. There has been progress and we have few fixes > being tested > https://github.com/vinothchandar/incubator-hudi/tree/hudi-312-flaky-tests > https://github.com/apache/incubator-hudi/pull/989 > > It boiled down the remnants from the previous run hanging around and > causing invalid states. We also had some threadpool that was n't closed > upon such an unexpected error causing the jvm to hang around. > @Balaji Varadarajan I think its best to rebuild and publish new images > which use local storage for hdfs . wdyt? > > Also filed a few follow ups : HUDI-322, HUDI-323 > > > On Sat, Oct 26, 2019 at 9:36 AM Vinoth Chandar <[email protected]> wrote: > > Disabling UI is not doing the trick. I think it gets stuck while starting > up (and not while exiting like I assumed incorrectly before). > > On Fri, Oct 25, 2019 at 9:00 AM Vinoth Chandar <[email protected]> wrote: > > Could we disable the UI and try again? Its either the jetty threads or the > two HDFS threads that's hanging on. Cannot understand why the JVM would n't > exit otherwise. > On Fri, Oct 25, 2019 at 5:27 AM Bhavani Sudha <[email protected]> > wrote: > > https://gist.github.com/bhasudha/5aac43d93a942f68bcab413a26229292 > Took a thread dump. Seems like jetty threads are not shutting down? Dont > see any hudi/spark related activity that is pending. Only threads in > RUNNABLE state are jetty ones > > On Fri, Oct 25, 2019 at 1:54 AM Pratyaksh Sharma <[email protected]> > wrote: > > > Hi Vinoth, > > > > > can you try > > - Do : docker ps -a and make sure there are no lingering containers. > > - if so, run : cd docker; ./stop_demo.sh > > - cd .. > > - mvn clean verify -DskipUTs=true -B > > > > I ran the above 3 times. Twice it was successful but once it incurred the > > same errors I listed in previous mail. > > > > On Fri, Oct 25, 2019 at 8:26 AM Vinoth Chandar < > > [email protected]> wrote: > > > > > Got the integ test to hang once, at the same spot as Pratyaksh > > mentioned.. > > > So it would be a good candidate to drill into. > > > > > > @nishith in this state, the containers are all open. So you could just > > hop > > > in and stack trace to see whats going on. > > > > > > > > > On Thu, Oct 24, 2019 at 9:14 AM Nishith <[email protected]> wrote: > > > > > > > I’m going to look into the flaky tests on Travis sometime today. > > > > > > > > -Nishith > > > > > > > > Sent from my iPhone > > > > > > > > > On Oct 23, 2019, at 10:23 PM, Vinoth Chandar <[email protected]> > > > wrote: > > > > > > > > > > Just to make sure we are on the same page, > > > > > > > > > > can you try > > > > > - Do : docker ps -a and make sure there are no lingering > containers. > > > > > - if so, run : cd docker; ./stop_demo.sh > > > > > - cd .. > > > > > - mvn clean verify -DskipUTs=true -B > > > > > > > > > > and this always gets stuck? The failures on CI seem to be random > > > > timeouts. > > > > > Not very related to this. > > > > > > > > > > FWIW I ran the above 3 times, without glitches so far.. So if you > can > > > > > confirm then it ll help > > > > > > > > > >> On Wed, Oct 23, 2019 at 7:04 AM Vinoth Chandar <[email protected] > > > > > > wrote: > > > > >> > > > > >> I saw someone else share the same experience. Can't think of > > anything > > > > that > > > > >> could have caused this to become flaky recently. > > > > >> I already created https://issues.apache.org/jira/browse/HUDI-312 > > > > >> < > > > > > > > > > > https://issues.apache.org/jira/browse/HUDI-312?filter=12347468&jql=project%20%3D%20HUDI%20AND%20fixVersion%20%3D%200.5.1%20AND%20(status%20%3D%20Open%20OR%20status%20%3D%20%22In%20Progress%22)%20ORDER%20BY%20assignee%20ASC > > > > > > > > to > > > > >> look into some flakiness on travis. > > > > >> > > > > >> any volunteers to drive this? (I am in the middle of fleshing out > an > > > > RFC) > > > > >> > > > > >> On Wed, Oct 23, 2019 at 6:43 AM Pratyaksh Sharma < > > > [email protected] > > > > > > > > > >> wrote: > > > > >> > > > > >>> It gets stuck forever while running the following - > > > > >>> > > > > >>> Container : /adhoc-1, Running command :spark-submit --class > > > > >>> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer > > > > >>> > > > > > > /var/hoodie/ws/docker/hoodie/hadoop/hive_base/target/hoodie-utilities.jar > > > > >>> --storage-type MERGE_ON_READ --source-class > > > > >>> org.apache.hudi.utilities.sources.JsonDFSSource > > > > --source-ordering-field ts > > > > >>> --target-base-path /user/hive/warehouse/stock_ticks_mor > > > --target-table > > > > >>> stock_ticks_mor --props /var/demo/config/dfs-source.properties > > > > >>> --schemaprovider-class > > > > >>> org.apache.hudi.utilities.schema.FilebasedSchemaProvider > > > > >>> --disable-compaction --enable-hive-sync --hoodie-conf > > > > >>> hoodie.datasource.hive_sync.jdbcurl=jdbc:hive2://hiveserver:10000 > > > > >>> --hoodie-conf hoodie.datasource.hive_sync.username=hive > > > --hoodie-conf > > > > >>> hoodie.datasource.hive_sync.password=hive --hoodie-conf > > > > >>> hoodie.datasource.hive_sync.partition_fields=dt --hoodie-conf > > > > >>> hoodie.datasource.hive_sync.database=default --hoodie-conf > > > > >>> hoodie.datasource.hive_sync.table=stock_ticks_mor > > > > >>> > > > > >>> On Wed, Oct 23, 2019 at 7:02 PM Pratyaksh Sharma < > > > > [email protected]> > > > > >>> wrote: > > > > >>> > > > > >>>> Hi, > > > > >>>> > > > > >>>> I am facing errors when trying to run integration tests using > the > > > > script > > > > >>>> travis_run_tests.sh and also it takes a lot of time or rather > gets > > > > >>> stuck. > > > > >>>> If I run them like normal junit tests, they work fine. > > > > >>>> > > > > >>>> Sometimes random transient errors also come, but these are the > > most > > > > >>>> frequent ones - > > > > >>>> > > > > >>>> [ERROR] Tests run: 3, Failures: 3, Errors: 0, Skipped: 0, Time > > > > elapsed: > > > > >>>> 345.207 s <<< FAILURE! - in > > org.apache.hudi.integ.ITTestHoodieSanity > > > > >>>> [ERROR] > > > > >>>> > > > > >>> > > > > > > > > > > testRunHoodieJavaAppOnSinglePartitionKeyCOWTable(org.apache.hudi.integ.ITTestHoodieSanity) > > > > >>>> Time elapsed: 129.227 s <<< FAILURE! > > > > >>>> java.lang.AssertionError: Expecting 100 rows to be present in > the > > > new > > > > >>>> table expected:<100> but was:<200> > > > > >>>> at > > > > >>>> > > > > >>> > > > > > > > > > > org.apache.hudi.integ.ITTestHoodieSanity.testRunHoodieJavaAppOnCOWTable(ITTestHoodieSanity.java:115) > > > > >>>> at > > > > >>>> > > > > >>> > > > > > > > > > > org.apache.hudi.integ.ITTestHoodieSanity.testRunHoodieJavaAppOnSinglePartitionKeyCOWTable(ITTestHoodieSanity.java:42) > > > > >>>> > > > > >>>> [ERROR] > > > > >>>> > > > > >>> > > > > > > > > > > testRunHoodieJavaAppOnMultiPartitionKeysCOWTable(org.apache.hudi.integ.ITTestHoodieSanity) > > > > >>>> Time elapsed: 108.146 s <<< FAILURE! > > > > >>>> java.lang.AssertionError: Expecting 100 rows to be present in > the > > > new > > > > >>>> table expected:<100> but was:<200> > > > > >>>> at > > > > >>>> > > > > >>> > > > > > > > > > > org.apache.hudi.integ.ITTestHoodieSanity.testRunHoodieJavaAppOnCOWTable(ITTestHoodieSanity.java:115) > > > > >>>> at > > > > >>>> > > > > >>> > > > > > > > > > > org.apache.hudi.integ.ITTestHoodieSanity.testRunHoodieJavaAppOnMultiPartitionKeysCOWTable(ITTestHoodieSanity.java:54) > > > > >>>> > > > > >>>> [ERROR] > > > > >>>> > > > > >>> > > > > > > > > > > testRunHoodieJavaAppOnNonPartitionedCOWTable(org.apache.hudi.integ.ITTestHoodieSanity) > > > > >>>> Time elapsed: 107.63 s <<< FAILURE! > > > > >>>> java.lang.AssertionError: Expecting 100 rows to be present in > the > > > new > > > > >>>> table expected:<100> but was:<200> > > > > >>>> at > > > > >>>> > > > > >>> > > > > > > > > > > org.apache.hudi.integ.ITTestHoodieSanity.testRunHoodieJavaAppOnCOWTable(ITTestHoodieSanity.java:115) > > > > >>>> at > > > > >>>> > > > > >>> > > > > > > > > > > org.apache.hudi.integ.ITTestHoodieSanity.testRunHoodieJavaAppOnNonPartitionedCOWTable(ITTestHoodieSanity.java:66) > > > > >>>> > > > > >>>> Has anybody else faced similar issues? > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>> > > > > >> > > > > > > > > > > > > >
