Hans,

Thanks. This does look like the right answer. We thought it might be a
previous map-reduce process interfering with itself, but Arvind carefully
checked the logs and it failed on the initial start-up. So something else
grabbing an ephemeral port is the likely culprit.  Not much else running on
these VMs, but enough to cause conflicts, I guess.

--Steve


> -----Original Message-----
> From: Hans Zeller [mailto:[email protected]]
> Sent: Tuesday, May 17, 2016 12:21 PM
> To: dev <[email protected]>
> Subject: Re: Trafodion release2.0 Daily Test Result - 14 - Still Failing
>
> One option would be to configure the Hadoop/HBase ports
> <http://trafodion.apache.org/port-assignment.html> to use the
> non-ephemeral
> range, another one to change the ephemeral range
> <http://unix.stackexchange.com/questions/249275/bind-failure-address-in-use-
> unable-to-use-a-tcp-port-for-both-source-and-desti>
> so that it doesn't conflict with the Hadoop ports. Is it worth the
> trouble,
> or do you just want to recognize the conflict quickly and take the
> problematic node out of the pool?
>
> Hans
>
> On Tue, May 17, 2016 at 11:55 AM, Steve Varnau <[email protected]>
> wrote:
>
> > Arvind and I are picking through the logs.  It looks like this
> > particular
> > VM
> > started up in such a way that one of the map-reduce services had a port
> > conflict, and hence cloudera manager reported failure every time
> > installer
> > tried to re-start the cluster.
> >
> > java.net.BindException: Port in use: 0.0.0.0:50030
> >
> > So it is a test environment problem -- the cluster already had an issue
> > before trafodion installer ran.
> >
> > Not quite sure a good way to get an automated fix for the environment.
> > Maybe I could code a better health check and take the node offline
> > before
> > it
> > affects multiple test jobs.  It is not frequent, but when it occurs,
> > several
> > jobs can be impacted.
> >
> > --Steve
> >
> >
> > > -----Original Message-----
> > > From: Steve Varnau [mailto:[email protected]]
> > > Sent: Tuesday, May 17, 2016 10:25 AM
> > > To: '[email protected]'
> > > <[email protected]>
> > > Subject: RE: Trafodion release2.0 Daily Test Result - 14 - Still
> > > Failing
> > >
> > > Yes, it is interesting that there was one bad node that always
> > > reported
> > > failure in
> > > re-start.
> > > The HBase looked good to me, so it might be a different service CMgr
> > > is
> > > complaining about.
> > > I'll spin up that VM so we can examine the logs that were not
> > > archived.
> > >
> > > --Steve
> > >
> > >
> > > > -----Original Message-----
> > > > From: Narain Arvind [mailto:[email protected]]
> > > > Sent: Tuesday, May 17, 2016 10:22 AM
> > > > To: [email protected]
> > > > Subject: RE: Trafodion release2.0 Daily Test Result - 14 - Still
> > Failing
> > > >
> > > > Hi Steve,
> > > >
> > > > All the non-udr failures seem to be related to restart of hbase
> > > > environment
> > > on
> > > > i-0c5597d1. Possible to access this system and look at the logs ?
> > > >
> > > >   "resultMessage" : "Command 'Start' failed for cluster
> > > > 'trafcluster'",
> > > >   "children" : {
> > > >     "items" : [ {
> > > >       "id" : 151,
> > > >       "name" : "Start",
> > > >       "startTime" : "2016-05-17T06:27:19.295Z",
> > > >       "endTime" : "2016-05-17T06:28:05.105Z",
> > > >       "active" : false,
> > > >       "success" : false,
> > > >       "resultMessage" : "At least one service failed to start."
> > > >
> > > >
> > > > Thanks
> > > > Arvind
> > > >
> > > > -----Original Message-----
> > > > From: [email protected] [mailto:[email protected]]
> > > > Sent: Tuesday, May 17, 2016 1:28 AM
> > > > To: [email protected]
> > > > Subject: Trafodion release2.0 Daily Test Result - 14 - Still Failing
> > > >
> > > > Daily Automated Testing release2.0
> > > >
> > > > Jenkins Job:
> > https://jenkins.esgyn.com/job/Check-Daily-release2.0/14/
> > > > Archived Logs: http://traf-testlogs.esgyn.com/Daily-release2.0/14
> > > > Bld Downloads: http://traf-builds.esgyn.com
> > > >
> > > > Changes since previous daily build:
> > > > No changes
> > > >
> > > >
> > > > Test Job Results:
> > > >
> > > > FAILURE core-regress-charsets-cdh (4 min 27 sec) FAILURE
> > > > core-regress-
> > > > compGeneral-cdh (9 min 44 sec) FAILURE core-regress-seabase-cdh (4
> min
> > > > 44
> > > > sec) FAILURE core-regress-udr-cdh (29 min) FAILURE core-regress-udr-
> hdp
> > > (41
> > > > min) FAILURE phoenix_part1_T4-cdh (5 min 48 sec) FAILURE
> > > phoenix_part2_T2-
> > > > cdh (4 min 39 sec) SUCCESS build-release2.0-debug (25 min) SUCCESS
> > > > build-
> > > > release2.0-release (29 min) SUCCESS core-regress-charsets-hdp (48
> > > > min)
> > > > SUCCESS core-regress-compGeneral-hdp (46 min) SUCCESS core-regress-
> > > core-
> > > > cdh (49 min) SUCCESS core-regress-core-hdp (59 min) SUCCESS
> > > > core-regress-
> > > > executor-cdh (58 min) SUCCESS core-regress-executor-hdp (1 hr 14
> > > > min)
> > > > SUCCESS core-regress-fullstack2-cdh (13 min) SUCCESS core-regress-
> > > fullstack2-
> > > > hdp (22 min) SUCCESS core-regress-hive-cdh (34 min) SUCCESS
> > > > core-regress-
> > > > hive-hdp (43 min) SUCCESS core-regress-privs1-cdh (37 min) SUCCESS
> > core-
> > > > regress-privs1-hdp (56 min) SUCCESS core-regress-privs2-cdh (42 min)
> > > SUCCESS
> > > > core-regress-privs2-hdp (44 min) SUCCESS core-regress-qat-cdh (21
> > > > min)
> > > > SUCCESS core-regress-qat-hdp (21 min) SUCCESS core-regress-seabase-
> hdp
> > > > (1
> > > > hr 20 min) SUCCESS jdbc_test-cdh (24 min) SUCCESS jdbc_test-hdp (41
> > min)
> > > > SUCCESS phoenix_part1_T2-cdh (1 hr 0 min) SUCCESS phoenix_part1_T2-
> hdp
> > > (1
> > > > hr 30 min) SUCCESS phoenix_part1_T4-hdp (1 hr 6 min) SUCCESS
> > > > phoenix_part2_T2-hdp (1 hr 17 min) SUCCESS phoenix_part2_T4-cdh (44
> > min)
> > > > SUCCESS phoenix_part2_T4-hdp (1 hr 0 min) SUCCESS pyodbc_test-cdh
> (16
> > > min)
> > > > SUCCESS pyodbc_test-hdp (15 min)
> > > >
> >

Reply via email to