Hi Srimanth,
Thanks for the response my replies below. I am using HDP-1.3.0.0.
a) It was a https call made to ambari server from agent;
INFO 2013-07-04 09:38:00,378 security.py:49 - SSL Connect being called..
connecting to the server
INFO 2013-07-04 09:38:00,563 Controller.py:99 - Unable to connect to:
https://xxxxx:8441/agent/v1/register/xxxxxx
If i change the jdk to 1.6, it starts working.
b) When I manually acess the url, I can properly see the status,
gangalia, do start/stop, config changes etc.
It doesnt jump back to the installer.
c)
Was the ambari-server started on localhost initially perhaps?
This could be. But after we corrected other machines, we did
ambari-server reset.
Next time it failed saying the same localhost, even though the conf
was proper.
Hence we removed rpm, but still did not help and finally deleted
/etc/ambari-agent and /usr/lib/ambari*. Which helped.
(Everytime we were doing retry for that machine installation alone)
d) Sure will have a look at the agent logs next time.
Thanks
Vivek
On Thursday 04 July 2013 07:10 PM, Srimanth Gunturi wrote:
Hi Vivek,
Wanted to find out the version of Ambari you are using.
a) What sort of communication failures were you seeing? If there is
anything specific in logs that you can share?
b) UI jumping to installer after login means that the server says
installation is not complete. Did you notice any errors during
install? Also when it does go back to installer, which page of
installer does it end on, and are any previous values populated?
When you do manually go to http://xxx:5858/#/main/dashboard - does it
stay there, or jump back to installer after a few clicks?
c) Ambari server should be setup on a hostname (hostname -f) from
where agent nodes can talk back.
Was the ambari-server started on localhost initially perhaps?
When some agent hosts had server as localhost - did you install agent
manually?
d) Ganglia server component failed to install for some reason. The
agent logs on that node should contain exceptions of why it failed.
Fixing that issue should help.
Regards,
Srimanth
On Wed, Jul 3, 2013 at 10:10 PM, Vivek Padmanabhan
<[email protected] <mailto:[email protected]>> wrote:
Hi,
I was trying out ambari to setup a cluster and we faced some of
the below issues. Would be great if someone could throw some light
on these;
a) Is it possible to run ambari with jdk1.7. We are seeing some
communication failures while using 1.7 for ambari.
But prior to ambari we have tested our hadoop programs with 1.7
and everything went well. And all of
our code base is in 1.7. (we have no native apps)
b) After a cluster setup finished successfully,we are able to see
the dashborad etc. But after few clicks or if i am accessing
it from a different machine it again redirects me to the
installation page.
I figured out that manually entering the below urls only can help
us. (our port is 585. and browser cache is cleared)
http://xxx:5858/#/main/dashboard
c) During our process of hadoop deployment and installation, some
servers failed (ssh access) and some passed .
So we had to reset and start from the beginning. But this time
those which passed earlier are failing now,
since it thinks that the ambari server is 'localhost' .
The property in the /etc/...ini file the server ip was proper. So,
we tried the following in those failed machines
* Remove rpm,reset ambari - This did not work on retry
* Remove the rpm,delete /etc/ambari-agent, delete /usr/lib/ambari*
, retry – It worked
Does this mean that the rpm -e did not remove all the files? Is
there anything extra we need to care take in such scenarios
d) Hadoop installation and deployment gets successful at random
retries. When it fails only message we saw was ;
ERROR ServiceComponentHostImpl:721 – Can’t handle
ServiceComponentHostEvent event at current state,
serviceComponentName=GANGLIA_SERVER, hostName=server233.xxxxxx,
currentState=INSTALL_FAILED, eventType=HOST_SVCCOMP_OP_
SUCCEEDED, event=EventType: HOST_SVCCOMP_OP_SUCCEEDED
15:17:12,934 WARN HeartBeatHandler:233 – State machine exception
org.apache.ambari.server.state.fsm.InvalidStateTransitionException: Invalid
event: HOST_SVCCOMP_OP_SUCCEEDED at INSTALL_FAILED
Thanks
Vivek