+1 for the fix! On Jul 30, 2015 9:55 PM, "Navina Ramesh" <nram...@linkedin.com> wrote:
> Yes, Yan. But that communication is initiated by the AM. Whether an > application's AM does it or not, the NM always heartbeats the status of > its containers to the RM. > On Jul 30, 2015 6:40 PM, "Yan Fang" <yanfang...@gmail.com> wrote: > >> Just one point to add: >> >> {quote} >> AM gets notified of container status from the RM. >> {quote} >> >> I think this is not 100% correct. AM can communicate with NM through >> NMClientAsync >> < >> https://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/yarn/client/api/async/NMClientAsync.html >> > >> to >> get container status, though Samza does not implement the CallbackHandler. >> >> Thanks, >> >> Fang, Yan >> yanfang...@gmail.com >> >> On Thu, Jul 30, 2015 at 6:06 PM, Navina Ramesh >> <nram...@linkedin.com.invalid >> > wrote: >> >> > The NM (and hence, by extension the container) heartbeats to the RM, not >> > the AM. AM gets notified of container status from the RM. >> > The AM starts / stops /releases a container process by communicating to >> the >> > NM. >> > >> > Navina >> > >> > >> > On Thu, Jul 30, 2015 at 5:55 PM, Thomas Becker <tobec...@tivo.com> >> wrote: >> > >> > > Ok, I thought there was some communication from the container to the >> AM, >> > > it sounds like you're saying it's in the other direction only? Don't >> > > containers heartbeat to the AM? Regardless, even if we can't get a >> > better >> > > address for the AM from YARN, we could at least filter the addresses >> we >> > get >> > > back from the JVM to exclude loopbacks. >> > > >> > > -Tommy >> > > ________________________________________ >> > > From: Navina Ramesh [nram...@linkedin.com.INVALID] >> > > Sent: Thursday, July 30, 2015 8:40 PM >> > > To: dev@samza.apache.org >> > > Subject: Re: Coordinator URL always 127.0.0.1 >> > > >> > > Hi Tommy, >> > > Yi is right. Container start is coordinated by the AppMaster using an >> > > NMClient. Container host name and port is provided by the RM during >> > > allocation. >> > > In Yarn (at least, afaik), when the node joins a cluster, the NM >> > registers >> > > itself with the RM. So, the NM might still be using >> > > getLocalhost.getAddress(). >> > > >> > > I don't know of any other way to programmatically fetch the machine's >> > > hostname (apart from some hacky shell commands). >> > > >> > > Cheers, >> > > Navina >> > > >> > > On Thu, Jul 30, 2015 at 5:23 PM, Yi Pan <nickpa...@gmail.com> wrote: >> > > >> > > > Hi, Tommy, >> > > > >> > > > Yeah, I agree that the current implementation is not bullet-proof to >> > any >> > > > different networking configuration on the host. As for the AM <-> >> > > container >> > > > communication, if I am not mistaken, it is through the NMClient and >> the >> > > > node HTTP address is wrapped within the Container object returned >> from >> > > RM. >> > > > I am not very familiar with that part of source code. Navina may be >> > able >> > > to >> > > > help more here. >> > > > >> > > > -Yi >> > > > >> > > > On Thu, Jul 30, 2015 at 4:27 PM, Thomas Becker <tobec...@tivo.com> >> > > wrote: >> > > > >> > > > > Hi Yi, >> > > > > Thanks a lot for your reply. I don't doubt we can get it to work >> by >> > > > > mucking with the networking configuration, but to me this feels >> like >> > a >> > > > > workaround, not a solution. >> > > InetAddress.getLocalHost().getHostAddress() >> > > > is >> > > > > not a reliable way of obtaining an IP that other machines can >> connect >> > > to. >> > > > > Just today I tested on several Linux distros and it did not work >> on >> > any >> > > > of >> > > > > them. Can we do something more robust here? How does the >> container >> > > > > communicate status to the AM? >> > > > > >> > > > > -Tommy >> > > > > >> > > > > ________________________________________ >> > > > > From: Yi Pan [nickpa...@gmail.com] >> > > > > Sent: Thursday, July 30, 2015 6:48 PM >> > > > > To: dev@samza.apache.org >> > > > > Subject: Re: Coordinator URL always 127.0.0.1 >> > > > > >> > > > > Hi, Tommy, >> > > > > >> > > > > I think that it might be a commonly asked question regarding to >> > > multiple >> > > > > IPs on a single host. A common trick w/o changing code is (copied >> > from >> > > > SO: >> > > > > >> > > > > >> > > > >> > > >> > >> http://stackoverflow.com/questions/2381316/java-inetaddress-getlocalhost-returns-127-0-0-1-how-to-get-real-ip >> > > > > ) >> > > > > >> > > > > {code} >> > > > > >> > > > > 1. >> > > > > >> > > > > Find your host name. Type: hostname. For example, you find your >> > > > hostname >> > > > > is mycomputer.xzy.com >> > > > > 2. >> > > > > >> > > > > Put your host name in your hosts file. /etc/hosts . Such as >> > > > > >> > > > > 10.50.16.136 mycomputer.xzy.com >> > > > > >> > > > > >> > > > > {code} >> > > > > >> > > > > -Yi >> > > > > >> > > > > On Thu, Jul 30, 2015 at 11:35 AM, Tommy Becker <tobec...@tivo.com >> > >> > > > wrote: >> > > > > >> > > > > > We are testing some jobs on a YARN grid and noticed they are >> often >> > > not >> > > > > > starting up properly due to being unable to connect to the job >> > > > > coordinator. >> > > > > > After some investigation it seems as if the jobs are always >> > getting a >> > > > > > coordinator URL of http://127.0.0.1:<port> But my >> understanding >> > is >> > > > that >> > > > > > the coordinator runs only in the AM, so I'd expect these URLs to >> > more >> > > > > often >> > > > > > than not be to some other machine. Looking at the code however, >> > I'm >> > > > not >> > > > > > sure how that would ever happen since the URL for the >> coordinator >> > > > always >> > > > > > comes from InetAddress.getLocalHost().getHostAddress() in >> > > > > > org.apache.samza.coordinator.server.HttpServer#getUrl >> > > > > > >> > > > > > Am I off base here? Because I don't see how this is ever going >> to >> > > work >> > > > > in >> > > > > > scenarios where the AM is on a different node than the >> containers. >> > > > > > >> > > > > > -- >> > > > > > Tommy Becker >> > > > > > Senior Software Engineer >> > > > > > >> > > > > > Digitalsmiths >> > > > > > A TiVo Company >> > > > > > >> > > > > > www.digitalsmiths.com<http://www.digitalsmiths.com> >> > > > > > tobec...@tivo.com<mailto:tobec...@tivo.com> >> > > > > > >> > > > > > ________________________________ >> > > > > > >> > > > > > This email and any attachments may contain confidential and >> > > privileged >> > > > > > material for the sole use of the intended recipient. Any review, >> > > > copying, >> > > > > > or distribution of this email (or any attachments) by others is >> > > > > prohibited. >> > > > > > If you are not the intended recipient, please contact the sender >> > > > > > immediately and permanently delete this email and any >> attachments. >> > No >> > > > > > employee or agent of TiVo Inc. is authorized to conclude any >> > binding >> > > > > > agreement on behalf of TiVo Inc. by email. Binding agreements >> with >> > > TiVo >> > > > > > Inc. may only be made by a signed written agreement. >> > > > > > >> > > > > >> > > > > ________________________________ >> > > > > >> > > > > This email and any attachments may contain confidential and >> > privileged >> > > > > material for the sole use of the intended recipient. Any review, >> > > copying, >> > > > > or distribution of this email (or any attachments) by others is >> > > > prohibited. >> > > > > If you are not the intended recipient, please contact the sender >> > > > > immediately and permanently delete this email and any >> attachments. No >> > > > > employee or agent of TiVo Inc. is authorized to conclude any >> binding >> > > > > agreement on behalf of TiVo Inc. by email. Binding agreements with >> > TiVo >> > > > > Inc. may only be made by a signed written agreement. >> > > > > >> > > > >> > > >> > > >> > > >> > > -- >> > > Navina R. >> > > >> > > ________________________________ >> > > >> > > This email and any attachments may contain confidential and privileged >> > > material for the sole use of the intended recipient. Any review, >> copying, >> > > or distribution of this email (or any attachments) by others is >> > prohibited. >> > > If you are not the intended recipient, please contact the sender >> > > immediately and permanently delete this email and any attachments. No >> > > employee or agent of TiVo Inc. is authorized to conclude any binding >> > > agreement on behalf of TiVo Inc. by email. Binding agreements with >> TiVo >> > > Inc. may only be made by a signed written agreement. >> > > >> > >> > >> > >> > -- >> > Navina R. >> > >> >