On Fri, Dec 2, 2016 at 3:15 AM, Chia-Hung Lin <[email protected]> wrote:
> Shameless plug the code written long time ago. Didn't find a chance to > modulize that. But feel free to use it as it's licensed in Apache 2. > > [1]. > https://github.com/apache/hama/tree/master/core/src/ > main/java/org/apache/hama/monitor/fd > > On Friday, 2 December 2016, P. Taylor Goetz <[email protected]> wrote: > > > There's not a lot of code there. Could it be reimplemented in gossip > > without infringing on any copyrights? > > > > -Taylor > > > > > On Dec 1, 2016, at 6:21 PM, Edward Capriolo <[email protected] > > <javascript:;>> wrote: > > > > > > I reached out to the initial author of the failure library to see if > they > > > would consider contributing it and I. I have not heard back. > > > > > > The library itself is comprised of two functions, with no unit testing, > > and > > > those functions lean heavily on commons-math. I think the signatures > and > > > the return types are not setup in a way that is natural for us to > > leverage. > > > I think it is best we simply write the code to execute the failure > > detector > > > logic ourselves. We can make with a method signature we want and > provide > > > our own direct testing. > > > > > > If anyone sees an alternative library let me know. Remember the > algorithm > > > itself is essentially a one-liner on top of common-math parts. > > > > > > Thanks, > > > Edward > > > > > > On Thu, Nov 17, 2016 at 1:49 PM, chandresh pancholi < > > > [email protected] <javascript:;>> wrote: > > > > > >> https://github.com/apache/incubator-gossip/compare/ > > >> master...edwardcapriolo:GOSSIP-22?expand=1 > > >> Try the whole URL. > > >> > > >> Thanks > > >> > > >> On Thu, Nov 17, 2016 at 11:15 PM, Sandeep More <[email protected] > > <javascript:;>> > > >> wrote: > > >> > > >>> Hello Edward, > > >>> > > >>> Sorry for jumping in late, I tried to look at the URL you gave, it > says > > >>> "There isn’t anything to compare." > > >>> > > >>> BTW https://github.com/arosien/failure looks great ! > > >>> > > >>> Best, > > >>> Sandeep > > >>> > > >>> > > >>> On Thu, Nov 17, 2016 at 11:52 AM, Edward Capriolo < > > [email protected] <javascript:;> > > >>> > > >>> wrote: > > >>> > > >>>> If someone gets a chance please review. It turned out to be a little > > >>> easier > > >>>> then i thought: > > >>>> > > >>>> https://github.com/apache/incubator-gossip/compare/ > > >>> master...edwardcapriolo > > >>>> : > > >>>> GOSSIP-22?expand=1 > > >>>> > > >>>> Leveraging the code here: > > >>>> > > >>>> https://github.com/arosien/failure > > >>>> > > >>>> I attempted to contact the author of failure (ASF V2) to see if he > > >> wants > > >>> to > > >>>> contribute the code. (not in maven) We have other options like fork > > and > > >>>> package etc. > > >>>> > > >>>> Lets hold off the merge of this until after the release. > > >>>> > > >>>> Thanks, > > >>>> Edward > > >>>> > > >>>> On Tue, Nov 15, 2016 at 10:42 PM, chandresh pancholi < > > >>>> [email protected] <javascript:;>> wrote: > > >>>> > > >>>>> I will also look into it. > > >>>>> > > >>>>> On Wed, Nov 16, 2016 at 5:53 AM, Edward Capriolo < > > >>> [email protected] <javascript:;>> > > >>>>> wrote: > > >>>>> > > >>>>>> This seems interesting and low bar to entry: > > >>>>>> > > >>>>>> https://github.com/arosien/failure > > >>>>>> > > >>>>>> On Tue, Nov 15, 2016 at 4:01 PM, Edward Capriolo < > > >>>> [email protected] <javascript:;>> > > >>>>>> wrote: > > >>>>>> > > >>>>>>> I was doing some load testing and I found the the current gating > > >>>> factor > > >>>>>>> for max instances running in the same JVM is limited by the JMX > > >>> based > > >>>>>>> notification system the failure detector uses. > > >>>>>>> > > >>>>>>> Currently a cluster of N requires N * (N-1) JMX notification > > >>>> threads. I > > >>>>>>> started attempting to remove this limit without going into > > >> building > > >>>> the > > >>>>>>> accrual failure detector (22) but there were some nuanced bugs > > >> and > > >>> I > > >>>>>> backed > > >>>>>>> off because it did not seem worth the change. > > >>>>>>> > > >>>>>>> If anyone has an literature to contribute about building a > > >>> consensus > > >>>>>> based > > >>>>>>> failure detector please discuss. Once we cut this release that is > > >>>>> likely > > >>>>>>> were I will spent my attention. > > >>>>>>> > > >>>>>>> Thanks, > > >>>>>>> Edward > > >>>>>>> > > >>>>>> > > >>>>> > > >>>>> > > >>>>> > > >>>>> -- > > >>>>> Chandresh Pancholi > > >>>>> Senior Software Engineer > > >>>>> Flipkart.com > > >>>>> Email-id:[email protected] <javascript:;> > > >>>>> Contact:08951803660 > > >>>>> > > >>>> > > >>> > > >> > > >> > > >> > > >> -- > > >> Chandresh Pancholi > > >> Senior Software Engineer > > >> Flipkart.com > > >> Email-id:[email protected] <javascript:;> > > >> Contact:08951803660 > > >> > > > > There's not a lot of code there. Could it be reimplemented in gossip > without infringing on any copyrights? Yes. Basically the paper detail the algorithm (it is basically a one liner) >> https://github.com/apache/hama/tree/master/core/src/main/java/org/apache/hama/monitor/fd This is interesting. The "math" parts are similar in both projects. Hama seems like a solid implementation. Some things I see as a challenge: FD code is coupled into the network code and for our purposes we only want the the logic. In the future we probably want to track some kind of removed state. UP, DOWN, REMOVED It is really nice that it is done using concurrent type collections instead of sync blocks. @Chia-Hung looking this over I see some interesting bits: I like how you can chose to be notified only on specific hosts, and how the notify is being done with a callback. https://github.com/apache/hama/blob/master/core/src/main/java/org/apache/hama/monitor/fd/NodeEventListener.java This is more feature rich then our current notifications which you can only register a single listener and you can not pick hosts to listen about. Obviously Hama's implementation is stable but maybe once we have a solid release or two under us maybe we can see if Hama users are comfortable with leveraging what we are building. Good stuff!
