I'm happy to contribute, but may not be in recent time for righting up with other tasks; therefore feel free to rework on it, or I'll refactor when I have free time. And I am happy to how it evolves.
On Friday, 2 December 2016, Edward Capriolo <[email protected]> wrote: > On Fri, Dec 2, 2016 at 3:15 AM, Chia-Hung Lin <[email protected] > <javascript:;>> wrote: > > > Shameless plug the code written long time ago. Didn't find a chance to > > modulize that. But feel free to use it as it's licensed in Apache 2. > > > > [1]. > > https://github.com/apache/hama/tree/master/core/src/ > > main/java/org/apache/hama/monitor/fd > > > > On Friday, 2 December 2016, P. Taylor Goetz <[email protected] > <javascript:;>> wrote: > > > > > There's not a lot of code there. Could it be reimplemented in gossip > > > without infringing on any copyrights? > > > > > > -Taylor > > > > > > > On Dec 1, 2016, at 6:21 PM, Edward Capriolo <[email protected] > <javascript:;> > > > <javascript:;>> wrote: > > > > > > > > I reached out to the initial author of the failure library to see if > > they > > > > would consider contributing it and I. I have not heard back. > > > > > > > > The library itself is comprised of two functions, with no unit > testing, > > > and > > > > those functions lean heavily on commons-math. I think the signatures > > and > > > > the return types are not setup in a way that is natural for us to > > > leverage. > > > > I think it is best we simply write the code to execute the failure > > > detector > > > > logic ourselves. We can make with a method signature we want and > > provide > > > > our own direct testing. > > > > > > > > If anyone sees an alternative library let me know. Remember the > > algorithm > > > > itself is essentially a one-liner on top of common-math parts. > > > > > > > > Thanks, > > > > Edward > > > > > > > > On Thu, Nov 17, 2016 at 1:49 PM, chandresh pancholi < > > > > [email protected] <javascript:;> <javascript:;>> wrote: > > > > > > > >> https://github.com/apache/incubator-gossip/compare/ > > > >> master...edwardcapriolo:GOSSIP-22?expand=1 > > > >> Try the whole URL. > > > >> > > > >> Thanks > > > >> > > > >> On Thu, Nov 17, 2016 at 11:15 PM, Sandeep More < > [email protected] <javascript:;> > > > <javascript:;>> > > > >> wrote: > > > >> > > > >>> Hello Edward, > > > >>> > > > >>> Sorry for jumping in late, I tried to look at the URL you gave, it > > says > > > >>> "There isn’t anything to compare." > > > >>> > > > >>> BTW https://github.com/arosien/failure looks great ! > > > >>> > > > >>> Best, > > > >>> Sandeep > > > >>> > > > >>> > > > >>> On Thu, Nov 17, 2016 at 11:52 AM, Edward Capriolo < > > > [email protected] <javascript:;> <javascript:;> > > > >>> > > > >>> wrote: > > > >>> > > > >>>> If someone gets a chance please review. It turned out to be a > little > > > >>> easier > > > >>>> then i thought: > > > >>>> > > > >>>> https://github.com/apache/incubator-gossip/compare/ > > > >>> master...edwardcapriolo > > > >>>> : > > > >>>> GOSSIP-22?expand=1 > > > >>>> > > > >>>> Leveraging the code here: > > > >>>> > > > >>>> https://github.com/arosien/failure > > > >>>> > > > >>>> I attempted to contact the author of failure (ASF V2) to see if he > > > >> wants > > > >>> to > > > >>>> contribute the code. (not in maven) We have other options like > fork > > > and > > > >>>> package etc. > > > >>>> > > > >>>> Lets hold off the merge of this until after the release. > > > >>>> > > > >>>> Thanks, > > > >>>> Edward > > > >>>> > > > >>>> On Tue, Nov 15, 2016 at 10:42 PM, chandresh pancholi < > > > >>>> [email protected] <javascript:;> <javascript:;>> > wrote: > > > >>>> > > > >>>>> I will also look into it. > > > >>>>> > > > >>>>> On Wed, Nov 16, 2016 at 5:53 AM, Edward Capriolo < > > > >>> [email protected] <javascript:;> <javascript:;>> > > > >>>>> wrote: > > > >>>>> > > > >>>>>> This seems interesting and low bar to entry: > > > >>>>>> > > > >>>>>> https://github.com/arosien/failure > > > >>>>>> > > > >>>>>> On Tue, Nov 15, 2016 at 4:01 PM, Edward Capriolo < > > > >>>> [email protected] <javascript:;> <javascript:;>> > > > >>>>>> wrote: > > > >>>>>> > > > >>>>>>> I was doing some load testing and I found the the current > gating > > > >>>> factor > > > >>>>>>> for max instances running in the same JVM is limited by the JMX > > > >>> based > > > >>>>>>> notification system the failure detector uses. > > > >>>>>>> > > > >>>>>>> Currently a cluster of N requires N * (N-1) JMX notification > > > >>>> threads. I > > > >>>>>>> started attempting to remove this limit without going into > > > >> building > > > >>>> the > > > >>>>>>> accrual failure detector (22) but there were some nuanced bugs > > > >> and > > > >>> I > > > >>>>>> backed > > > >>>>>>> off because it did not seem worth the change. > > > >>>>>>> > > > >>>>>>> If anyone has an literature to contribute about building a > > > >>> consensus > > > >>>>>> based > > > >>>>>>> failure detector please discuss. Once we cut this release that > is > > > >>>>> likely > > > >>>>>>> were I will spent my attention. > > > >>>>>>> > > > >>>>>>> Thanks, > > > >>>>>>> Edward > > > >>>>>>> > > > >>>>>> > > > >>>>> > > > >>>>> > > > >>>>> > > > >>>>> -- > > > >>>>> Chandresh Pancholi > > > >>>>> Senior Software Engineer > > > >>>>> Flipkart.com > > > >>>>> Email-id:[email protected] <javascript:;> > <javascript:;> > > > >>>>> Contact:08951803660 > > > >>>>> > > > >>>> > > > >>> > > > >> > > > >> > > > >> > > > >> -- > > > >> Chandresh Pancholi > > > >> Senior Software Engineer > > > >> Flipkart.com > > > >> Email-id:[email protected] <javascript:;> > <javascript:;> > > > >> Contact:08951803660 > > > >> > > > > > > > > There's not a lot of code there. Could it be reimplemented in gossip > > without infringing on any copyrights? > > Yes. Basically the paper detail the algorithm (it is basically a one liner) > > >> > https://github.com/apache/hama/tree/master/core/src/ > main/java/org/apache/hama/monitor/fd > > This is interesting. The "math" parts are similar in both projects. > > Hama seems like a solid implementation. Some things I see as a challenge: > FD code is coupled into the network code and for our purposes we only want > the the logic. > In the future we probably want to track some kind of removed state. UP, > DOWN, REMOVED > > It is really nice that it is done using concurrent type collections instead > of sync blocks. > > @Chia-Hung looking this over I see some interesting bits: > > I like how you can chose to be notified only on specific hosts, and how the > notify is being done with a callback. > https://github.com/apache/hama/blob/master/core/src/ > main/java/org/apache/hama/monitor/fd/NodeEventListener.java > > This is more feature rich then our current notifications which you can only > register a single listener and you can not pick hosts to listen about. > > Obviously Hama's implementation is stable but maybe once we have a solid > release or two under us maybe we can see if Hama users are comfortable with > leveraging what we are building. > > Good stuff! >
