Well, I found it's already reverted. But I think we don't have to.
After discussed with Jason, I worked out a new fix. It kept previous 5591's intention of exception handling and improved on assigning the port. The port is now checked if available, so it will also resolve 2 minutes timeout issue for the retry. (Or at least will not make things worse). On Wed, Sep 5, 2018 at 10:14 AM, Ryan McMahon <rmcma...@pivotal.io> wrote: > +1 for reverting in both places. > > I see that there is already an isGatewayReceiver flag in the AcceptorImpl > constructor. It's not ideal, but could we use this flag to prevent the 2 > minute retry logic for happening if this flag is true? > > Ryan > > On Wed, Sep 5, 2018 at 10:01 AM, Lynn Hughes-Godfrey < > lhughesgodf...@pivotal.io> wrote: > > > +1 for reverting in both places. > > > > On Wed, Sep 5, 2018 at 9:50 AM, Dan Smith <dsm...@pivotal.io> wrote: > > > > > +1 for reverting in both places. The current fix is not better, that's > > why > > > we are reverting it on the release branch! > > > > > > -Dan > > > > > > On Wed, Sep 5, 2018 at 9:47 AM, Jacob Barrett <jbarr...@pivotal.io> > > wrote: > > > > > > > I’m not ok with reverting in develop. Revert in 1.7 and modify in > > > develop. > > > > We shouldn’t go backwards in develop. The current fix is better than > > the > > > > bug it fixes. > > > > > > > > > On Sep 5, 2018, at 9:40 AM, Nabarun Nag <n...@apache.org> wrote: > > > > > > > > > > If everyone is okay with it, I will revert that change in develop > and > > > > then > > > > > cherry pick it to release/1.7.0 branch. > > > > > Please do comment. > > > > > > > > > > Regards > > > > > Nabarun Nag > > > > > > > > > > > > > > >> On Wed, Sep 5, 2018 at 9:30 AM Dan Smith <dsm...@pivotal.io> > wrote: > > > > >> > > > > >> +1 to yank it and rework the fix. > > > > >> > > > > >> Gester's change helps, but it just means that you will sometimes > > > > randomly > > > > >> have a 2 minute delay starting up a gateway receiver. I don't > think > > > > that is > > > > >> a great user experience either. > > > > >> > > > > >> -Dan > > > > >> > > > > >> On Wed, Sep 5, 2018 at 8:20 AM, Bruce Schuchardt < > > > > bschucha...@pivotal.io> > > > > >> wrote: > > > > >> > > > > >>> Let's yank it > > > > >>> > > > > >>> > > > > >>> > > > > >>>> On 9/4/18 5:04 PM, Sean Goller wrote: > > > > >>>> > > > > >>>> If it's to get the release out, I'm fine with reverting. I don't > > > like > > > > >> it, > > > > >>>> but I'm not willing to die on that hill. :) > > > > >>>> > > > > >>>> -S. > > > > >>>> > > > > >>>> On Tue, Sep 4, 2018 at 4:38 PM Dan Smith <dsm...@pivotal.io> > > wrote: > > > > >>>> > > > > >>>> Spitting this into a separate thread. > > > > >>>>> > > > > >>>>> I see the issue. The two minute timeout is the constructor for > > > > >>>>> AcceptorImpl, where it retries to bind for 2 minutes. > > > > >>>>> > > > > >>>>> That behavior makes sense for CacheServer.start. > > > > >>>>> > > > > >>>>> But it doesn't make sense for the new logic in > > > > GatewayReceiver.start() > > > > >>>>> from > > > > >>>>> GEODE-5591. That code is trying to use CacheServer.start to > scan > > > for > > > > an > > > > >>>>> available port, trying each port in a range. That free port > > finding > > > > >> logic > > > > >>>>> really doesn't want to have two minutes of retries for each > port. > > > It > > > > >>>>> seems > > > > >>>>> like we need to rework the fix for GEODE-5591. > > > > >>>>> > > > > >>>>> Does it make sense to hold up the release to rework this fix, > or > > > > should > > > > >>>>> we > > > > >>>>> just revert it? Have we switched concourse over to using alpine > > > > linux, > > > > >>>>> which I think was the original motivation for this fix? > > > > >>>>> > > > > >>>>> -Dan > > > > >>>>> > > > > >>>>> On Tue, Sep 4, 2018 at 4:25 PM, Dan Smith <dsm...@pivotal.io> > > > wrote: > > > > >>>>> > > > > >>>>> Why is it waiting at all in this case? Where is this 2 minute > > > timeout > > > > >>>>>> coming from? > > > > >>>>>> > > > > >>>>>> -Dan > > > > >>>>>> > > > > >>>>>> On Tue, Sep 4, 2018 at 4:12 PM, Sai Boorlagadda < > > > > >>>>>> > > > > >>>>> sai.boorlaga...@gmail.com > > > > >>>>> > > > > >>>>>> wrote: > > > > >>>>>>> So the issue is that it takes longer to start than previous > > > > releases? > > > > >>>>>>> Also, is this wait time only when using Gfsh to create > > > > >>>>>>> gateway-receiver? > > > > >>>>>>> > > > > >>>>>>> On Tue, Sep 4, 2018 at 4:03 PM Nabarun Nag <n...@apache.org> > > > > wrote: > > > > >>>>>>> > > > > >>>>>>> Currently we have a minor issue in the release branch as > > pointed > > > > out > > > > >>>>>>>> > > > > >>>>>>> by > > > > >>>>> > > > > >>>>>> Barry O. > > > > >>>>>>>> We will wait till a resolution is figured out for this > issue. > > > > >>>>>>>> > > > > >>>>>>>> Steps: > > > > >>>>>>>> 1. create locator > > > > >>>>>>>> 2. start server --name=server1 --server-port=40404 > > > > >>>>>>>> 3. start server --name=server2 --server-port=40405 > > > > >>>>>>>> 4. create gateway-receiver --member=server1 > > > > >>>>>>>> 5. create gateway-receiver --member=server2 `This gets stuck > > > for 2 > > > > >>>>>>>> > > > > >>>>>>> minutes` > > > > >>>>>>> > > > > >>>>>>>> Is the 2 minute wait time acceptable? Should we document it? > > > When > > > > we > > > > >>>>>>>> > > > > >>>>>>> revert > > > > >>>>>>> > > > > >>>>>>>> GEODE-5591, this issue does not happen. > > > > >>>>>>>> > > > > >>>>>>>> Regards > > > > >>>>>>>> Nabarun Nag > > > > >>>>>>>> > > > > >>>>>>>> > > > > >>> > > > > >> > > > > > > > > > >
diff --git a/geode-wan/src/main/java/org/apache/geode/internal/cache/wan/GatewayReceiverImpl.java b/geode-wan/src/main/java/org/apache/geode/internal/cache/wan/GatewayReceiverImpl.java index cd2702991..95c363c63 100644 --- a/geode-wan/src/main/java/org/apache/geode/internal/cache/wan/GatewayReceiverImpl.java +++ b/geode-wan/src/main/java/org/apache/geode/internal/cache/wan/GatewayReceiverImpl.java @@ -26,6 +26,7 @@ import org.apache.geode.cache.wan.GatewayReceiver; import org.apache.geode.cache.wan.GatewayTransportFilter; import org.apache.geode.distributed.internal.InternalDistributedSystem; import org.apache.geode.distributed.internal.ResourceEvent; +import org.apache.geode.internal.AvailablePort; import org.apache.geode.internal.cache.CacheServerImpl; import org.apache.geode.internal.cache.InternalCache; import org.apache.geode.internal.i18n.LocalizedStrings; @@ -144,7 +145,12 @@ public class GatewayReceiverImpl implements GatewayReceiver { return; } - for (int port = this.startPort; port <= this.endPort; port++) { + int loopStartPort = getPortToStart(); + for (int port = loopStartPort; port <= endPort; port++) { + if (!AvailablePort.isPortAvailable(port, AvailablePort.SOCKET, + AvailablePort.getAddress(AvailablePort.SOCKET))) { + continue; + } receiver.setPort(port); receiver.setSocketBufferSize(socketBufferSize); receiver.setMaximumTimeBetweenPings(timeBetPings); @@ -159,15 +165,20 @@ public class GatewayReceiverImpl implements GatewayReceiver { this.port = port; break; } catch (IOException e) { - if (port == this.endPort) { - throw new GatewayReceiverException("No available free port found in the given range (" + + if (port == this.endPort && startPort != endPort) { + port = this.startPort; + logger.info("loopback to " + this.startPort); + } else if (port == loopStartPort - 1 || startPort == endPort) { + logger.warn("No available free port found in the given range (" + this.startPort + "-" + this.endPort + ")", e); + throw new GatewayReceiverException( + LocalizedStrings.SocketCreator_FAILED_TO_CREATE_SERVER_SOCKET_ON_0_1 + .toLocalizedString(new Object[] {bindAdd, this.port})); } } - } logger .info(LocalizedMessage.create(LocalizedStrings.GatewayReceiver_STARTED_ON_PORT, this.port)); @@ -177,6 +188,18 @@ public class GatewayReceiverImpl implements GatewayReceiver { } + private int getPortToStart() { + // choose a random port from the given port range + int rPort; + if (this.startPort == this.endPort) { + rPort = this.startPort; + } else { + rPort = AvailablePort.getRandomAvailablePortInRange(this.startPort, this.endPort, + AvailablePort.SOCKET); + } + return rPort; + } + public void stop() { if (!isRunning()) { throw new GatewayReceiverException(