I'm happy to contribute, but may not be in recent time for righting up with
other tasks; therefore feel free to rework on it, or I'll refactor when I
have free time. And I am happy to how it evolves.

On Friday, 2 December 2016, Edward Capriolo <[email protected]> wrote:

> On Fri, Dec 2, 2016 at 3:15 AM, Chia-Hung Lin <[email protected]
> <javascript:;>> wrote:
>
> > Shameless plug the code written long time ago. Didn't find a chance to
> > modulize that. But feel free to use it as it's licensed in Apache 2.
> >
> > [1].
> > https://github.com/apache/hama/tree/master/core/src/
> > main/java/org/apache/hama/monitor/fd
> >
> > On Friday, 2 December 2016, P. Taylor Goetz <[email protected]
> <javascript:;>> wrote:
> >
> > > There's not a lot of code there. Could it be reimplemented in gossip
> > > without infringing on any copyrights?
> > >
> > > -Taylor
> > >
> > > > On Dec 1, 2016, at 6:21 PM, Edward Capriolo <[email protected]
> <javascript:;>
> > > <javascript:;>> wrote:
> > > >
> > > > I reached out to the initial author of the failure library to see if
> > they
> > > > would consider contributing it and I. I have not heard back.
> > > >
> > > > The library itself is comprised of two functions, with no unit
> testing,
> > > and
> > > > those functions lean heavily on commons-math. I think the signatures
> > and
> > > > the return types are not setup in a way that is natural for us to
> > > leverage.
> > > > I think it is best we simply write the code to execute the failure
> > > detector
> > > > logic ourselves.  We can make with a method signature we want and
> > provide
> > > > our own direct testing.
> > > >
> > > > If anyone sees an alternative library let me know. Remember the
> > algorithm
> > > > itself is essentially a one-liner on top of common-math parts.
> > > >
> > > > Thanks,
> > > > Edward
> > > >
> > > > On Thu, Nov 17, 2016 at 1:49 PM, chandresh pancholi <
> > > > [email protected] <javascript:;> <javascript:;>> wrote:
> > > >
> > > >> https://github.com/apache/incubator-gossip/compare/
> > > >> master...edwardcapriolo:GOSSIP-22?expand=1
> > > >> Try the whole URL.
> > > >>
> > > >> Thanks
> > > >>
> > > >> On Thu, Nov 17, 2016 at 11:15 PM, Sandeep More <
> [email protected] <javascript:;>
> > > <javascript:;>>
> > > >> wrote:
> > > >>
> > > >>> Hello Edward,
> > > >>>
> > > >>> Sorry for jumping in late, I tried to look at the URL you gave, it
> > says
> > > >>> "There isn’t anything to compare."
> > > >>>
> > > >>> BTW https://github.com/arosien/failure looks great !
> > > >>>
> > > >>> Best,
> > > >>> Sandeep
> > > >>>
> > > >>>
> > > >>> On Thu, Nov 17, 2016 at 11:52 AM, Edward Capriolo <
> > > [email protected] <javascript:;> <javascript:;>
> > > >>>
> > > >>> wrote:
> > > >>>
> > > >>>> If someone gets a chance please review. It turned out to be a
> little
> > > >>> easier
> > > >>>> then i thought:
> > > >>>>
> > > >>>> https://github.com/apache/incubator-gossip/compare/
> > > >>> master...edwardcapriolo
> > > >>>> :
> > > >>>> GOSSIP-22?expand=1
> > > >>>>
> > > >>>> Leveraging the code here:
> > > >>>>
> > > >>>> https://github.com/arosien/failure
> > > >>>>
> > > >>>> I attempted to contact the author of failure (ASF V2) to see if he
> > > >> wants
> > > >>> to
> > > >>>> contribute the code. (not in maven) We have other options like
> fork
> > > and
> > > >>>> package etc.
> > > >>>>
> > > >>>> Lets hold off the merge of this until after the release.
> > > >>>>
> > > >>>> Thanks,
> > > >>>> Edward
> > > >>>>
> > > >>>> On Tue, Nov 15, 2016 at 10:42 PM, chandresh pancholi <
> > > >>>> [email protected] <javascript:;> <javascript:;>>
> wrote:
> > > >>>>
> > > >>>>> I will also look into it.
> > > >>>>>
> > > >>>>> On Wed, Nov 16, 2016 at 5:53 AM, Edward Capriolo <
> > > >>> [email protected] <javascript:;> <javascript:;>>
> > > >>>>> wrote:
> > > >>>>>
> > > >>>>>> This seems interesting and low bar to entry:
> > > >>>>>>
> > > >>>>>> https://github.com/arosien/failure
> > > >>>>>>
> > > >>>>>> On Tue, Nov 15, 2016 at 4:01 PM, Edward Capriolo <
> > > >>>> [email protected] <javascript:;> <javascript:;>>
> > > >>>>>> wrote:
> > > >>>>>>
> > > >>>>>>> I was doing some load testing and I found the the current
> gating
> > > >>>> factor
> > > >>>>>>> for max instances running in the same JVM is limited by the JMX
> > > >>> based
> > > >>>>>>> notification system the failure detector uses.
> > > >>>>>>>
> > > >>>>>>> Currently a cluster of N requires N * (N-1) JMX notification
> > > >>>> threads. I
> > > >>>>>>> started attempting to remove this limit without going into
> > > >> building
> > > >>>> the
> > > >>>>>>> accrual failure detector (22) but there were some nuanced bugs
> > > >> and
> > > >>> I
> > > >>>>>> backed
> > > >>>>>>> off because it did not seem worth the change.
> > > >>>>>>>
> > > >>>>>>> If anyone has an literature to contribute about building a
> > > >>> consensus
> > > >>>>>> based
> > > >>>>>>> failure detector please discuss. Once we cut this release that
> is
> > > >>>>> likely
> > > >>>>>>> were I will spent my attention.
> > > >>>>>>>
> > > >>>>>>> Thanks,
> > > >>>>>>> Edward
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>> --
> > > >>>>> Chandresh Pancholi
> > > >>>>> Senior Software Engineer
> > > >>>>> Flipkart.com
> > > >>>>> Email-id:[email protected] <javascript:;>
> <javascript:;>
> > > >>>>> Contact:08951803660
> > > >>>>>
> > > >>>>
> > > >>>
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> Chandresh Pancholi
> > > >> Senior Software Engineer
> > > >> Flipkart.com
> > > >> Email-id:[email protected] <javascript:;>
> <javascript:;>
> > > >> Contact:08951803660
> > > >>
> > >
> >
>
> > There's not a lot of code there. Could it be reimplemented in gossip
> > without infringing on any copyrights?
>
> Yes. Basically the paper detail the algorithm (it is basically a one liner)
>
> >>
> https://github.com/apache/hama/tree/master/core/src/
> main/java/org/apache/hama/monitor/fd
>
> This is interesting. The "math" parts are similar in both projects.
>
> Hama seems like a solid implementation. Some things I see as a challenge:
> FD code is coupled into the network code and for our purposes we only want
> the the logic.
> In the future we probably want to track some kind of removed state. UP,
> DOWN, REMOVED
>
> It is really nice that it is done using concurrent type collections instead
> of sync blocks.
>
> @Chia-Hung looking this over I see some interesting bits:
>
> I like how you can chose to be notified only on specific hosts, and how the
> notify is being done with a callback.
> https://github.com/apache/hama/blob/master/core/src/
> main/java/org/apache/hama/monitor/fd/NodeEventListener.java
>
> This is more feature rich then our current notifications which you can only
> register a single listener and you can not pick hosts to listen about.
>
> Obviously Hama's implementation is stable but maybe once we have a solid
> release or two under us maybe we can see if Hama users are comfortable with
> leveraging what we are building.
>
> Good stuff!
>

Reply via email to