Re: Vision letter, reqest for discussion

Edward Capriolo Mon, 17 Jul 2017 14:31:54 -0700

On Mon, Jul 17, 2017 at 3:57 PM, Gary Dusbabek <[email protected]> wrote:


> Sorry for the late reply. I am on holiday.
>
> I think part of the problem is that the community is so small. It's
> difficult right now to get PRs merged for lack of reviewers. And in cases
> where participants disagree, and there is no consensus, no real work can
> get done.
>
> For example, I would love to push through a big refactoring that improves
> the coupling problem in the code base. It is near impossible to write good
> unit tests currently. And it's difficult to write features if you cannot
> easily test them. However, I don't feel like there is support for this kind
> of change.
>
> So in short, when there are competing visions, and not a small community,
> it will be difficult to make headway.
>
> As for the CRDTs, etc. I don't think there is any need for them right now,
> personally. They are a scratch with no itch. :)
>
> Gary.
>
>
> On Tue, Jul 11, 2017 at 11:58 AM, Edward Capriolo <[email protected]>
> wrote:
>
> > On Tue, Jul 11, 2017 at 11:15 AM, Русак Максим <[email protected]>
> > wrote:
> >
> > > Hello, Gossip community.
> > > Today I want to discuss our vision of Gossip project, its purpose and
> > > future steps.
> > > I think the main problem is that even I have not clear vision of our
> > goals
> > > and future steps, I think all members of our small community - 5-10
> > > members, have their our unique vision - it's illy.
> > > Are we just implementation of Gossip? Or do we want to implement much
> > more
> > > algorithms and to solve more problems? If yes, what problems?
> > > Who is our user in both cases?
> > > I think without this understanding and obtaining first users quickly
> > > community can fall apart.
> > >
> > > I think our goal now:
> > > 1. Formulate goals and principles of Apache Gossip
> > > 2. After that we'll understand who is our exemplary user, which
> problems
> > > we can solve for him
> > > 3. Then we'll understand the shortest path to a real adaptation. We'll
> > get
> > > one real user and do all stuff to make Gossip decent for him.
> > >
> > > I'm GSoC participant, I have a lot of time now to work and I want to
> move
> > > Gossip to the new level. My tasks are CRDTs, SWIM and Consensus.
> > > For example, I can't understand which of these tasks will lead us to
> > users
> > > and to what kind of users?
> > > CRDT umbrella task (GOSSIP-67) has a lot of CRDTs, I implemented almost
> > > all of them, two remaining Crdts are so rare and complicated that I
> think
> > > there is no need to implement them. I think even some of already
> > > implemented are not necessary.
> > > The same situation with SWIM. We have some algorithm now, but the
> system
> > > in general is not usable by anybody, we can't understand is this
> > algorithm
> > > good or not? We mustn't fabricate needs of our users, we should analyze
> > > problems of real users.
> > > The same with Consensus. Is it in our plan and does it correspond to
> our
> > > vision? Is there anybody who is interested in it?
> > >
> > > "Features for features" is not our goal. "Features for solving users'
> > > pain" have sense.
> > > I want to bring you one example of strong community and successful
> > > company. It's Hashicorp. They have SWIM implemented and running in
> > > production on thousands of machines. And they not just implement the
> most
> > > modern algorithms. They do research and innovations. And it's not only
> > due
> > > to their passion to algorithms, it's due to pain of their users, their
> > > clear vision and desire to solve users' problems.
> > > It's the only way to build big robust community (and company like
> > > Hashicorp) - formulate purpose and aim on obtaining users.
> > > So let's think about our understanding of Apache Gossip and decide
> > whether
> > > SWIM or Consensus is highest priority to obtain first users or not?
> > > If we decide that it is, I'll do it with pleasure. If not, let's
> compose
> > > plan to first users and I'll bring them.
> > >
> > > Thanks, Maxim Rusak
> > >
> >
> > Maxim,
> >
> > Apache follows a "Scratch an itch" philosophy.
> > https://commons.apache.org/volunteering.html. This is different from a
> > traditional software product or consulting company. We do not need to
> make
> > a "road map" or decide who are "users" are. You and I are both volunteers
> > to the Gossip effort.
> >
> > If you say that the other two CRDT types we have ticket are ticket are
> rare
> > and complicated, we can close them as WONT_FIX, or we can leave them open
> > in case someone else wants to work on them. That is a discussion we
> should
> > have possibly by a case by case basis possibly inside the ticket.
> >
> > Implicitly we understand that for Gossip to be successfully then people
> > have to use it. A key part of that is having features that matter to
> > people.
> >
> > "So let's think about our understanding of Apache Gossip and decide
> whether
> > SWIM or Consensus is highest priority to obtain first users or not?"
> >
> > I am not a business analyst. These are things I know:
> >
> > 1) Riak has CRDT support
> > 2) Spark uses a gossip layer
> > 3) Cassandra Uses a gossip layer
> > 4) zookeeper has watchers (close to our event listeners)
> > 5) hashicorp has a product you mentioned
> > 6) akka has crdt support
> >
> > We outlined some possible end-goals user cases from Gossip when we
> proposed
> > it to the incubator. I have also worked with some other apache projects
> > looking for possible implementations of Gossip such as:
> > https://issues.apache.org/jira/browse/IGNITE-4837.
> >
> > I do not understand YOUR confusion about YOUR GSOC proposal for SWIM. The
> > ticket is self explaining: We want to implement SWIM, so that gossip can
> > scale to larger numbers of nodes. We DO not need to do it expressly to
> > "find users" because Gossip is not a for profit company. YOU are working
> on
> > the ticket because YOU find it interesting and the committers agreed it
> was
> > interesting enough to make a GSOC proposal for it, GSOC found it
> > interesting enough not to reject it as spam.
> >
> > Gossip is not a for profit company, but that does not mean we should not
> > attempt to solve problems of real users, have a road map, or get the
> > software in many peoples hands. How do we do that?
> >
> > There is no simple answer. I think the primary vehicle is blogging and
> > community. For example I asked everyone to write up their GSOC work into
> > blogs:
> >
> >
> >    - Wrote a blog regarding Data Change Event Listeners
> >       -  https://medium.com/@mirage20/listening-to-data-change-
> >       events-in-apache-gossip-a0f0a4ea4c21
> >       <https://medium.com/@mirage20/listening-to-data-change-
> > events-in-apache-gossip-a0f0a4ea4c21>
> >    - Wrote a blog regarding Data Replication Control
> >       - https://medium.com/@mirage20/data-replication-control-in-
> >       apache-gossip-35777771e2bb
> >
> > I can tweet out these blogs, some people follow me, they might re-tweet,
> > word of mouth we get users who try software or committers interested in
> > scratching their own itch.
> >
> > I suggest some reading about the Apache-Way:
> > https://www.apache.org/foundation/how-it-works.html .
> >
> > Also I suggest starting to fill out details of your tickets and creating
> > specific threads on the message board. IE what are you researching about
> > swim? What were the conclusions? What are other alternatives? The ticket
> is
> > basically empty https://issues.apache.org/jira/browse/GOSSIP-51.
> >
>

The CRDTs are important for downstream applications. For example, the CRDT
types are going to make it much easier to do ...anything. Zookeeper has
features like writing ORDERED_EPHEMERAL nodes so you can mix and match
writes and reads with different semantics and glue together a lock, or a
leader election, etc.

Shared and per-node data provides only a put(x,y) and get(x). Because
Gossip replication happens lazily the scheme to acquire a lock or elect a
leader might be something like a structure that crosses a number of keys.
CRDTs give us the key building block to manipulate complex types in a
masterless way.

IE. If I am writing "storm" I need a place to store topology, great I can
denormalize that to key/value and store it in shared data. Next, I need a
way that 10-100 storm nodes can agree on who is doing what topology. With
the CRDTs and the  voting (Mirage) in flight we will have that.

"For example, I would love to push through a big refactoring that improves
the coupling problem in the code base. It is near impossible to write good
unit tests currently. And it's difficult to write features if you cannot
easily test them. However, I don't feel like there is support for this kind
of change."

>From my prospective mentally the refactoring tickets are a slight bit hard
to track. The either tend to be a series of small ones that eat up a lot of
admin bandwidth or a bulky one that starts small and gets large. I do not
have a problem with refactor tickets specifically, but I would rather see
them in the scope of features. For example, the change to support SWIM and
our current Gossiper we are forced to think about the problem differently
and have two concrete cases so that we can design the correct API. Sorry
something was hanging out there that you feel was un-acked.

"So in short, when there are competing visions, and not a small community,
it will be difficult to make headway."

I am not sure I agree. Early on in Gossip I was approaching things like the
larger (apache) projects I worked on. I was kinda used to "hey committers
here is a patch" someone would roll around and review and then tell me a
fix, repeat a few times merge.

We just do not have the bodies for that.  If you want to make a change (as
a committer) you do not really have to wait around for consensus. For a
committer there is an implicit "WILL MERGE IN 2 DAYS IF NO COMMENT". If you
are not a committer (or want to wait for my blessing) you are probably
going to have to send me a singing telegram or two. Doing the apache
releases takes cycles, mentoring the GSOC proposals takes cycles, life,
jobs, etc.

As for Gossip having a direction, I don't want Gossip to follow the lead of
other "owned" apache projects. "Hey we are 'EdTech' the commercial
consulting/solutions arm in the engine room of 'apache gossip' we have all
the committers and we have a ROAD MAP and our CTO KNOWS WHAT TO DO BASED ON
WHAT OUR INVESTORS WANT TO HEAR and if your working on something else
......tough crap."  :)

I am trying to move at a pace that others can follow/play along. I _COULD_
have implemented all the CRDTs, but I did one and left the rest open. This
leaves a door open for others to make meaningful contributions. That is
essentially what I am trying to do, guide. If i had more time (job, gossip
(reviews, releases, gsoc), other apache pmc roles, 2 year old) i would
probably do more outreach like meetups and blogs. There are less bodies on
deck then I expected at this phase but such is life. I see some projects
are in the incubator for 3-4 years, not trying to go that long, but not
trying to rush either.

Re: Vision letter, reqest for discussion

Reply via email to