I agree with Jon's assessment. On Jul 12, 2014, at 4:28 AM, Jon Maron <[email protected]> wrote:
> That modification may make sense, but it still seems that the calculation of > the existence of a live server is incorrect - if there is a live region > server and a dead region server, the net result isn't "no region server". > > -- Jon > > On Jul 12, 2014, at 6:42 AM, Ted Yu <[email protected]> wrote: > >> Alternatively the agent detects whether previous port is available, reuses >> the same port if it is. Otherwise fallback to current behavior. >> >> This would work in single tenant case. >> >> Cheers >> >> On Jul 12, 2014, at 3:11 AM, Steve Loughran <[email protected]> wrote: >> >>> ...maybe the agent could be set up to perform a sleep for a while if a port >>> is in use, in the hope it will be cleaned up. >>> >>> >>> On 12 July 2014 06:34, Sumit Mohanty <[email protected]> wrote: >>> >>>> It cannot be guaranteed that the previous port will still be available when >>>> the application is thawed - in reality the application can be thawed in few >>>> minutes or even few days. So I think reusing the old port might be a risk. >>>> >>>> Isn't this a case for the QE script to change? Inherently, for yarn apps, >>>> they need to handle dynamic host/port. >>>> >>>> >>>> On Fri, Jul 11, 2014 at 9:16 PM, Ted Yu <[email protected]> wrote: >>>> >>>>> Hi, >>>>> When I was debugging existing QE hbase test script, I found that in the >>>>> following situation (create - freeze - thaw) it was determined that there >>>>> was no live server: >>>>> region server, hor10n03.gq1.ygridcore.net,42175,1405108984098, was >>>>> considered dead by the new master due to 'freeze' action >>>>> the new region server, hor10n03.gq1.ygridcore.net,46329,1405120269524 >>>>> <http://hor10n03.gq1.ygridcore.net:60941/>, was live however master >>>>> didn't remove the first one from the dead servers list due to port not >>>>> matching. >>>>> QE script drew the conclusion because 1(live)-1(dead) = 0 >>>>> >>>>> You can observe this scenario here: >>>>> http://hor10n01.gq1.ygridcore.net:50938/master-status >>>>> >>>>> Since region server was brought up on the same node and the previous port >>>>> was still free: >>>>> >>>>> [hortonzy@hor10n03 ~]$ sudo netstat -tulpn | grep 42175 >>>>> [hortonzy@hor10n03 ~]$ >>>>> >>>>> I think proper action should be to reuse the previous port when thawing. >>>>> >>>>> Please comment. >>>> >>>> -- >>>> CONFIDENTIALITY NOTICE >>>> NOTICE: This message is intended for the use of the individual or entity to >>>> which it is addressed and may contain information that is confidential, >>>> privileged and exempt from disclosure under applicable law. If the reader >>>> of this message is not the intended recipient, you are hereby notified that >>>> any printing, copying, dissemination, distribution, disclosure or >>>> forwarding of this communication is strictly prohibited. If you have >>>> received this communication in error, please contact the sender immediately >>>> and delete it from your system. Thank You. >>> >>> -- >>> CONFIDENTIALITY NOTICE >>> NOTICE: This message is intended for the use of the individual or entity to >>> which it is addressed and may contain information that is confidential, >>> privileged and exempt from disclosure under applicable law. If the reader >>> of this message is not the intended recipient, you are hereby notified that >>> any printing, copying, dissemination, distribution, disclosure or >>> forwarding of this communication is strictly prohibited. If you have >>> received this communication in error, please contact the sender immediately >>> and delete it from your system. Thank You. > > > -- > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or entity to > which it is addressed and may contain information that is confidential, > privileged and exempt from disclosure under applicable law. If the reader > of this message is not the intended recipient, you are hereby notified that > any printing, copying, dissemination, distribution, disclosure or > forwarding of this communication is strictly prohibited. If you have > received this communication in error, please contact the sender immediately > and delete it from your system. Thank You.
