One option would be to configure the Hadoop/HBase ports <http://trafodion.apache.org/port-assignment.html> to use the non-ephemeral range, another one to change the ephemeral range <http://unix.stackexchange.com/questions/249275/bind-failure-address-in-use-unable-to-use-a-tcp-port-for-both-source-and-desti> so that it doesn't conflict with the Hadoop ports. Is it worth the trouble, or do you just want to recognize the conflict quickly and take the problematic node out of the pool?
Hans On Tue, May 17, 2016 at 11:55 AM, Steve Varnau <[email protected]> wrote: > Arvind and I are picking through the logs. It looks like this particular > VM > started up in such a way that one of the map-reduce services had a port > conflict, and hence cloudera manager reported failure every time installer > tried to re-start the cluster. > > java.net.BindException: Port in use: 0.0.0.0:50030 > > So it is a test environment problem -- the cluster already had an issue > before trafodion installer ran. > > Not quite sure a good way to get an automated fix for the environment. > Maybe I could code a better health check and take the node offline before > it > affects multiple test jobs. It is not frequent, but when it occurs, > several > jobs can be impacted. > > --Steve > > > > -----Original Message----- > > From: Steve Varnau [mailto:[email protected]] > > Sent: Tuesday, May 17, 2016 10:25 AM > > To: '[email protected]' > > <[email protected]> > > Subject: RE: Trafodion release2.0 Daily Test Result - 14 - Still Failing > > > > Yes, it is interesting that there was one bad node that always reported > > failure in > > re-start. > > The HBase looked good to me, so it might be a different service CMgr is > > complaining about. > > I'll spin up that VM so we can examine the logs that were not archived. > > > > --Steve > > > > > > > -----Original Message----- > > > From: Narain Arvind [mailto:[email protected]] > > > Sent: Tuesday, May 17, 2016 10:22 AM > > > To: [email protected] > > > Subject: RE: Trafodion release2.0 Daily Test Result - 14 - Still > Failing > > > > > > Hi Steve, > > > > > > All the non-udr failures seem to be related to restart of hbase > > > environment > > on > > > i-0c5597d1. Possible to access this system and look at the logs ? > > > > > > "resultMessage" : "Command 'Start' failed for cluster 'trafcluster'", > > > "children" : { > > > "items" : [ { > > > "id" : 151, > > > "name" : "Start", > > > "startTime" : "2016-05-17T06:27:19.295Z", > > > "endTime" : "2016-05-17T06:28:05.105Z", > > > "active" : false, > > > "success" : false, > > > "resultMessage" : "At least one service failed to start." > > > > > > > > > Thanks > > > Arvind > > > > > > -----Original Message----- > > > From: [email protected] [mailto:[email protected]] > > > Sent: Tuesday, May 17, 2016 1:28 AM > > > To: [email protected] > > > Subject: Trafodion release2.0 Daily Test Result - 14 - Still Failing > > > > > > Daily Automated Testing release2.0 > > > > > > Jenkins Job: > https://jenkins.esgyn.com/job/Check-Daily-release2.0/14/ > > > Archived Logs: http://traf-testlogs.esgyn.com/Daily-release2.0/14 > > > Bld Downloads: http://traf-builds.esgyn.com > > > > > > Changes since previous daily build: > > > No changes > > > > > > > > > Test Job Results: > > > > > > FAILURE core-regress-charsets-cdh (4 min 27 sec) FAILURE core-regress- > > > compGeneral-cdh (9 min 44 sec) FAILURE core-regress-seabase-cdh (4 min > > > 44 > > > sec) FAILURE core-regress-udr-cdh (29 min) FAILURE core-regress-udr-hdp > > (41 > > > min) FAILURE phoenix_part1_T4-cdh (5 min 48 sec) FAILURE > > phoenix_part2_T2- > > > cdh (4 min 39 sec) SUCCESS build-release2.0-debug (25 min) SUCCESS > > > build- > > > release2.0-release (29 min) SUCCESS core-regress-charsets-hdp (48 min) > > > SUCCESS core-regress-compGeneral-hdp (46 min) SUCCESS core-regress- > > core- > > > cdh (49 min) SUCCESS core-regress-core-hdp (59 min) SUCCESS > > > core-regress- > > > executor-cdh (58 min) SUCCESS core-regress-executor-hdp (1 hr 14 min) > > > SUCCESS core-regress-fullstack2-cdh (13 min) SUCCESS core-regress- > > fullstack2- > > > hdp (22 min) SUCCESS core-regress-hive-cdh (34 min) SUCCESS > > > core-regress- > > > hive-hdp (43 min) SUCCESS core-regress-privs1-cdh (37 min) SUCCESS > core- > > > regress-privs1-hdp (56 min) SUCCESS core-regress-privs2-cdh (42 min) > > SUCCESS > > > core-regress-privs2-hdp (44 min) SUCCESS core-regress-qat-cdh (21 min) > > > SUCCESS core-regress-qat-hdp (21 min) SUCCESS core-regress-seabase-hdp > > > (1 > > > hr 20 min) SUCCESS jdbc_test-cdh (24 min) SUCCESS jdbc_test-hdp (41 > min) > > > SUCCESS phoenix_part1_T2-cdh (1 hr 0 min) SUCCESS phoenix_part1_T2-hdp > > (1 > > > hr 30 min) SUCCESS phoenix_part1_T4-hdp (1 hr 6 min) SUCCESS > > > phoenix_part2_T2-hdp (1 hr 17 min) SUCCESS phoenix_part2_T4-cdh (44 > min) > > > SUCCESS phoenix_part2_T4-hdp (1 hr 0 min) SUCCESS pyodbc_test-cdh (16 > > min) > > > SUCCESS pyodbc_test-hdp (15 min) > > > >
