Perhaps an aside, but: exactly what is kept in Zookeeper, and what code is
responsible for keeping it up-to-date?

Ceej

On Mon, Aug 24, 2015 at 5:28 PM, Raman Grover <[email protected]>
wrote:

> Well, the state of an instance (and metadata including configuration) is
> kept in Zookeeper instance that is accessible to Managix and CC. CC should
> be able to set the state of the cluster in Zookeeper under the right znode
> which can viewed by Managix.
>
> There exists a communication channel for CC and Managix to share
> information on state etc. I am not sure if we need another channel such as
> RMI between Managix and CC.
>
> Regards,
> Raman
>
>
>
> On Mon, Aug 24, 2015 at 12:58 PM, abdullah alamoudi <[email protected]>
> wrote:
>
> > Well, it depends on your definition of the boundaries of managix. What I
> > did is that I added an RMI object in the InstallerDriver which basically
> > listen for state changes from the cluster controller. This means some
> > additional logic in the CCApplicationEntryPoint where after the CC is
> > ready, it contacts the InstallerDriver using RMI and at that point only,
> > the InstallerDriver can return to managix and tells it that the startup
> is
> > complete.
> >
> > Not sure if this is the right way to do it but it definitely is better
> than
> > what we currently have.
> > Abdullah.
> >
> > On Mon, Aug 24, 2015 at 10:00 PM, Chris Hillery <[email protected]>
> > wrote:
> >
> > > Hopefully the solution won't involve additional important logic inside
> > > Managix itself?
> > >
> > > Ceej
> > > aka Chris Hillery
> > >
> > > On Mon, Aug 24, 2015 at 7:26 AM, abdullah alamoudi <[email protected]
> >
> > > wrote:
> > >
> > > > That works but it doesn't feel right doing it this way. I am going to
> > fix
> > > > this one for good.
> > > >
> > > > Cheers,
> > > > Abdullah.
> > > >
> > > > On Mon, Aug 24, 2015 at 5:11 PM, Ian Maxon <[email protected]> wrote:
> > > >
> > > > > The way I assured liveness for the YARN installer was to try
> running
> > > "for
> > > > > $x in dataset Metadata.Dataset return $x" via the API. I just
> polled
> > > for
> > > > a
> > > > > reasonable amount of time  (though honestly, thinking about it now,
> > the
> > > > > correct parameter to use for the polling interval is the startup
> wait
> > > > time
> > > > > in the parameters file :) ). It's not perfect, but it gives less
> > false
> > > > > positives than just checking ps for processes that look like
> CCs/NCs.
> > > > >
> > > > > - Ian.
> > > > >
> > > > > On Mon, Aug 24, 2015 at 5:03 AM, abdullah alamoudi <
> > [email protected]
> > > >
> > > > > wrote:
> > > > >
> > > > > > Now that I think about it. Maybe we should provide multiple ways
> to
> > > do
> > > > > > this. A polling mechanism to be used for arbitrary time and a
> > pushing
> > > > > > mechanism on startup.
> > > > > > I am going to start implementation of this and will probably use
> > RMI
> > > > for
> > > > > > this task both ways (CC to InstallerDriver and InstallerDriver to
> > > CC).
> > > > > >
> > > > > > Cheers,
> > > > > > Abdullah.
> > > > > >
> > > > > > On Mon, Aug 24, 2015 at 2:19 PM, abdullah alamoudi <
> > > [email protected]
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > So after further investigation, turned out our startup process
> > just
> > > > > > starts
> > > > > > > the CC and NC processes and then make sure the processes are
> > > running
> > > > > and
> > > > > > if
> > > > > > > the processes were found to be running, it returns the state of
> > the
> > > > > > cluster
> > > > > > > to be active and the subsequent test commands can start
> > > immediately.
> > > > > > >
> > > > > > > This means that the CC could've started but is not yet ready
> when
> > > we
> > > > > try
> > > > > > > to process the next command. To address this, we need a better
> > way
> > > to
> > > > > > tell
> > > > > > > when the startup procedure has completed. we can do this by
> > pushing
> > > > (CC
> > > > > > > informs installer driver when the startup is complete) or
> polling
> > > > (The
> > > > > > > installer driver needs to actually query the CC for the state
> of
> > > the
> > > > > > > cluster).
> > > > > > >
> > > > > > > I can do either way so let's vote. My vote goes to the pushing
> > > > > mechanism.
> > > > > > > Thoughts?
> > > > > > >
> > > > > > > On Mon, Aug 24, 2015 at 10:15 AM, abdullah alamoudi <
> > > > > [email protected]>
> > > > > > > wrote:
> > > > > > >
> > > > > > >> This solution turned out to be incorrect. Actually, the test
> > cases
> > > > > when
> > > > > > I
> > > > > > >> build after using the join method never fails but running an
> > > actual
> > > > > > asterix
> > > > > > >> instance never succeeds which is quite confusing.
> > > > > > >>
> > > > > > >> I also think that the startup script has a major bug where it
> > > might
> > > > > > >> returns before the startup is complete. More on this
> later......
> > > > > > >>
> > > > > > >> On Mon, Aug 24, 2015 at 7:48 AM, abdullah alamoudi <
> > > > > [email protected]>
> > > > > > >> wrote:
> > > > > > >>
> > > > > > >>> It is highly unlikely that it is related.
> > > > > > >>>
> > > > > > >>> Cheers,
> > > > > > >>> Abdullah.
> > > > > > >>>
> > > > > > >>> On Mon, Aug 24, 2015 at 5:45 AM, Chen Li <[email protected]>
> > > wrote:
> > > > > > >>>
> > > > > > >>>> @Abdullah: Is this issue related to
> > > > > > >>>> https://issues.apache.org/jira/browse/ASTERIXDB-1074? Ian
> > and I
> > > > > plan
> > > > > > to
> > > > > > >>>> look into the details on Monday.
> > > > > > >>>>
> > > > > > >>>> On Sun, Aug 23, 2015 at 10:08 AM, abdullah alamoudi <
> > > > > > [email protected]
> > > > > > >>>> >
> > > > > > >>>> wrote:
> > > > > > >>>>
> > > > > > >>>> > About 3-4 days ago, I was working on the addition of the
> > > > > filesystem
> > > > > > >>>> based
> > > > > > >>>> > feed adapter and it didn't take anytime to complete.
> > However,
> > > > > when I
> > > > > > >>>> wanted
> > > > > > >>>> > to build and make sure all tests pass, I kept getting
> > > > > > >>>> ConnectionRefused
> > > > > > >>>> > errors which caused the installer tests to fail every now
> > and
> > > > > then.
> > > > > > >>>> >
> > > > > > >>>> > I knew the new change had nothing to do with this failure,
> > > yet,
> > > > I
> > > > > > >>>> couldn't
> > > > > > >>>> > direct my attention away from this bug (It just bothered
> me
> > so
> > > > > much
> > > > > > >>>> and I
> > > > > > >>>> > knew it needs to be resolved ASAP). After wasting
> countless
> > > > > hours, I
> > > > > > >>>> was
> > > > > > >>>> > finally able to figure out what was happening :-)
> > > > > > >>>> >
> > > > > > >>>> > In the startup routine, we start three Jetty web servers
> > (Web
> > > > > > >>>> interface
> > > > > > >>>> > server, JSON API server, and Feed server). Sometime ago,
> we
> > > used
> > > > > to
> > > > > > >>>> end the
> > > > > > >>>> > startup call before making sure the server.isStarted()
> > method
> > > > > > returns
> > > > > > >>>> true
> > > > > > >>>> > on all servers. At that time, I introduced the
> > > > > waitUntilServerStarts
> > > > > > >>>> method
> > > > > > >>>> > to make sure we don't return before the servers are ready.
> > > > Turned
> > > > > > >>>> out, that
> > > > > > >>>> > was an incorrect way to handle this (We can blame
> > > stackoverflow
> > > > > for
> > > > > > >>>> this
> > > > > > >>>> > one!) and it is not enough that the server isStarted()
> > returns
> > > > > true.
> > > > > > >>>> The
> > > > > > >>>> > correct way to do this is to call the server.join() method
> > > after
> > > > > the
> > > > > > >>>> > server.start().
> > > > > > >>>> >
> > > > > > >>>> > See:
> > > > > > >>>> >
> > > > > > >>>>
> > > > > >
> > > > >
> > > >
> > >
> >
> http://stackoverflow.com/questions/15924874/embedded-jetty-why-to-use-join
> > > > > > >>>> >
> > > > > > >>>> > This was equally satisfying as it was frustrating and you
> > are
> > > > > > welcome
> > > > > > >>>> for
> > > > > > >>>> > the future time I saved each of you :)
> > > > > > >>>> > --
> > > > > > >>>> > Amoudi, Abdullah.
> > > > > > >>>> >
> > > > > > >>>>
> > > > > > >>>
> > > > > > >>>
> > > > > > >>>
> > > > > > >>> --
> > > > > > >>> Amoudi, Abdullah.
> > > > > > >>>
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >> --
> > > > > > >> Amoudi, Abdullah.
> > > > > > >>
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Amoudi, Abdullah.
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Amoudi, Abdullah.
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Amoudi, Abdullah.
> > > >
> > >
> >
> >
> >
> > --
> > Amoudi, Abdullah.
> >
>
>
>
> --
> Raman
>

Reply via email to