Re: Shard Splitting API Proposal

Nick Vatamaniuc Mon, 25 Mar 2019 08:17:21 -0700

Thank you, Jan, and everyone else who took a look!

On Mon, Mar 25, 2019 at 5:56 AM Jan Lehnardt <[email protected]> wrote:


> Hi Nick,
>
> despite https://twitter.com/janl/status/1108741392975777795 left one typo
> comment ;)
>
> Great job everyone!
>
> Jan
> —
>
>
> > On 21. Mar 2019, at 15:44, Jan Lehnardt <[email protected]> wrote:
> >
> > Hi Nick,
> >
> > On first glance, this looks all great and like an exemplary PR that is
> easy to follow. And bonus props for the nice docs. I'll have more time for
> a through review over the weekend.
> >
> > Cheers
> > Jan
> > —
> >
> >> On 18. Mar 2019, at 19:42, Nick Vatamaniuc <[email protected]> wrote:
> >>
> >> Hello everyone,
> >>
> >> Thank you all (Joan, Jan, Mike, Ilya, Adam) who contributed to the API
> >> discussion. There is now a PR open
> >> https://github.com/apache/couchdb/pull/1972 . If you get a chance, I
> would
> >> appreciate any reviews, feedback or comments.
> >>
> >> The PR message explains how the commits are organized and references the
> >> RFC. Basically it starts with preparatory work, ensuring all the
> existing
> >> components know how to deal with split shards. Then, some lower level
> bits
> >> are implemented, like bulk copy, internal replicator updates, etc.,
> >> followed by the individual job implementation and the job manager which
> >> stitches everything together. In the end is the HTTP API implementation
> >> along with a suite of unit and Elixir integration tests.
> >>
> >> There is also a README_reshard.md file in src/mem3 that tries to
> provide a
> >> more in-depth technical description of how everything fits together.
> >>
> https://github.com/apache/couchdb/pull/1972/files#diff-5ac7b51ec4e03e068bf271f34ecf88df
> >> (notice
> >> this URL might changer after a rebase).
> >>
> >> Also special thanks to Paul (job module implementation, get_ring
> function,
> >> a lot of architectural and implementation advice), Eric (finding many
> bugs,
> >> fixes for the bugs, and writing bulk copy and change feed tests), and
> Jay
> >> (testing and a thorough code review).
> >>
> >> Cheers,
> >> -Nick
> >>
> >>> On Sun, Feb 17, 2019 at 2:32 AM Jan Lehnardt <[email protected]> wrote:
> >>>
> >>> Heya Nick,
> >>>
> >>> Nicely done. I think even though the majority of the discussion had
> >>> already happened here, the RFC nicely pulled together the various
> >>> discussion threads into a coherent whole.
> >>>
> >>> I would imagine the discussion on GH would be similarly fruitful.
> >>>
> >>> I gave it my +1, and as I said on the outset: I'm very excited about
> this
> >>> feature!
> >>>
> >>> Best
> >>> Jan
> >>> —
> >>>
> >>>> On 15. Feb 2019, at 23:45, Nick Vatamaniuc <[email protected]>
> wrote:
> >>>>
> >>>> Decided to kick the tires on the new RFC proposal issue type and
> created
> >>>> one for shard splitting:
> >>>>
> >>>> https://github.com/apache/couchdb/issues/1920
> >>>>
> >>>> Let's see how it goes. Being it's the first one let me know if I
> missed
> >>>> anything obvious.
> >>>>
> >>>> Also I'd like to thank everyone who contributed to the discussion. The
> >>> API
> >>>> is looking more solid and is much improved from where it started.
> >>>>
> >>>> Cheers,
> >>>> -Nick
> >>>>
> >>>>
> >>>>
> >>>>> On Wed, Feb 13, 2019 at 12:03 PM Nick Vatamaniuc <[email protected]
> >
> >>> wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>>> On Wed, Feb 13, 2019 at 11:52 AM Jan Lehnardt <[email protected]>
> wrote:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>> On 13. Feb 2019, at 17:12, Nick Vatamaniuc <[email protected]>
> >>> wrote:
> >>>>>>>
> >>>>>>> Hi Jan,
> >>>>>>>
> >>>>>>> Thanks for taking a look!
> >>>>>>>
> >>>>>>>> On Wed, Feb 13, 2019 at 6:28 AM Jan Lehnardt <[email protected]>
> wrote:
> >>>>>>>>
> >>>>>>>> Nick, this is great, I have a few tiny nits left, apologies I only
> >>> now
> >>>>>> got
> >>>>>>>> to it.
> >>>>>>>>
> >>>>>>>>> On 12. Feb 2019, at 18:08, Nick Vatamaniuc <[email protected]>
> >>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>> Shard Splitting API Proposal
> >>>>>>>>>
> >>>>>>>>> I'd like thank everyone who contributed to the API discussion.
> As a
> >>>>>>>> result
> >>>>>>>>> we have a much better and consistent API that what we started
> with.
> >>>>>>>>>
> >>>>>>>>> Before continuing I wanted to summarize to see what we ended up
> >>> with.
> >>>>>> The
> >>>>>>>>> main changes since the initial proposal were switching to using
> >>>>>> /_reshard
> >>>>>>>>> as the main endpoint and having a detailed state transition
> history
> >>>>>> for
> >>>>>>>>> jobs.
> >>>>>>>>>
> >>>>>>>>> * GET /_reshard
> >>>>>>>>>
> >>>>>>>>> Top level summary. Besides the new _reshard endpoint, there
> `reason`
> >>>>>> and
> >>>>>>>>> the stats are more detailed.
> >>>>>>>>>
> >>>>>>>>> Returns
> >>>>>>>>>
> >>>>>>>>> {
> >>>>>>>>> "completed": 3,
> >>>>>>>>> "failed": 4,
> >>>>>>>>> "running": 0,
> >>>>>>>>> "state": "stopped",
> >>>>>>>>> "state_reason": "Manual rebalancing",
> >>>>>>>>> "stopped": 0,
> >>>>>>>>> "total": 7
> >>>>>>>>> }
> >>>>>>>>>
> >>>>>>>>> * PUT /_reshard/state
> >>>>>>>>>
> >>>>>>>>> Start or stop global rebalacing.
> >>>>>>>>>
> >>>>>>>>> Body
> >>>>>>>>>
> >>>>>>>>> {
> >>>>>>>>> "state": "stopped",
> >>>>>>>>> "reason": "Manual rebalancing"
> >>>>>>>>> }
> >>>>>>>>>
> >>>>>>>>> Returns
> >>>>>>>>>
> >>>>>>>>> {
> >>>>>>>>> "ok": true
> >>>>>>>>> }
> >>>>>>>>>
> >>>>>>>>> * GET /_reshard/state
> >>>>>>>>>
> >>>>>>>>> Return global resharding state and reason.
> >>>>>>>>>
> >>>>>>>>> {
> >>>>>>>>> "reason": "Manual rebalancing",
> >>>>>>>>> "state": “stopped”
> >>>>>>>>> }
> >>>>>>>>
> >>>>>>>> More a note than a change request, but `state` is a very generic
> term
> >>>>>> that
> >>>>>>>> often confuses folks when they are new to something. If the set of
> >>>>>> possible
> >>>>>>>> states is `started` and `stopped`, how about making this endpoint
> a
> >>>>>> boolean?
> >>>>>>>>
> >>>>>>>> /_reshard/enabled
> >>>>>>>>
> >>>>>>>> {
> >>>>>>>> "enabled": true|false,
> >>>>>>>> "reason": "Manual rebalancing"
> >>>>>>>> }
> >>>>>>>>
> >>>>>>>>
> >>>>>>> I thought of that as well. However _reshard/state seemed consistent
> >>> with
> >>>>>>> _reshard/jobs/$jobid/state. Setting "state":"stopped"
> _reshard/state
> >>>>>> will
> >>>>>>> lead to all individual running job state to become "stopped" as
> well.
> >>>>>> And
> >>>>>>> "running" will make jobs that are not individually stopped also
> become
> >>>>>>> "running". In other words since it directly toggle job's state
> (with a
> >>>>>> job
> >>>>>>> being to override stopped state) I like that it had the same
> arguments
> >>>>>>
> >>>>>> Got it, makes perfect sense.
> >>>>>>
> >>>>>>> and": true|false
> >>>>>>>
> >>>>>>>
> >>>>>>>>
> >>>>>>>>> * GET /_reshard/jobs
> >>>>>>>>>
> >>>>>>>>> Get the state of all the resharding jobs on the cluster. Now we
> >>> have a
> >>>>>>>>> detailed
> >>>>>>>>> state transition history which looks similar what _scheduler/jobs
> >>>>>> have.
> >>>>>>>>>
> >>>>>>>>> {
> >>>>>>>>> "jobs": [
> >>>>>>>>>    {
> >>>>>>>>>        "history": [
> >>>>>>>>>            {
> >>>>>>>>>                "detail": null,
> >>>>>>>>>                "timestamp": "2019-02-06T22:28:06Z",
> >>>>>>>>>                "type": "new"
> >>>>>>>>>            },
> >>>>>>>>>            ...
> >>>>>>>>>            {
> >>>>>>>>>                "detail": null,
> >>>>>>>>>                "timestamp": "2019-02-06T22:28:10Z",
> >>>>>>>>>                "type": "completed"
> >>>>>>>>>            }
> >>>>>>>>>        ],
> >>>>>>>>>        "id":
> >>>>>>>>>
> >>>>>>
> "001-0a308ef9f7bd24bd4887d6e619682a6d3bb3d0fd94625866c5216ec1167b4e23",
> >>>>>>>>>        "job_state": "completed",
> >>>>>>>>>        "node": "[email protected]",
> >>>>>>>>>        "source": "shards/00000000-ffffffff/db1.1549492084",
> >>>>>>>>>        "split_state": "completed",
> >>>>>>>>>        "start_time": "2019-02-06T22:28:06Z",
> >>>>>>>>>        "state_info": {},
> >>>>>>>>>        "targets": [
> >>>>>>>>>            "shards/00000000-7fffffff/db1.1549492084",
> >>>>>>>>>            "shards/80000000-ffffffff/db1.1549492084"
> >>>>>>>>>        ],
> >>>>>>>>
> >>>>>>>> Since we went from /_split to /_reshard to prepare for merging
> >>> shards,
> >>>>>> we
> >>>>>>>> should reconsider source (singular) and targets (plural). Either a
> >>>>>> merge
> >>>>>>>> job (in the future) uses sources (plural) and target (singular)
> and
> >>>>>> the job
> >>>>>>>> schema is intentionally different, or we unify things to, maybe
> >>>>>> singular:
> >>>>>>>> source/target which would have the nice property of being
> analogous
> >>> to
> >>>>>> our
> >>>>>>>> replication job schema. The type definition then is source:String
> and
> >>>>>>>> target:Array(2) for split jobs and source:Array(2) target:String
> for
> >>>>>>>> (future) merge jobs.
> >>>>>>>>
> >>>>>>>>
> >>>>>>> Joan suggested adding a "type" field to both job creation POST body
> >>> and
> >>>>>>> also returning it when we inspect the job(s) state. So the
> >>>>>> "type":"split"
> >>>>>>> would toggle the schema. It could be "merge" in the future, or even
> >>>>>>> something like "rebalance" where it would merge some and split
> others
> >>>>>>> perhaps :-) and since we have a type it would be easier to
> >>> differentiate
> >>>>>>> between the merge and split jobs. But if there is a consensus from
> >>>>>> others
> >>>>>>> about switching targets to target that's easily as well.
> >>>>>>
> >>>>>> Ah, I’m less concerned here about not being able to tell whether
> it’s a
> >>>>>> split or a merge, and more about that having an indiscriminate
> plural
> >>>>>> form (sourceS/targetS) depending on the type. It’s just an easy
> thing
> >>> to
> >>>>>> get wrong.
> >>>>>>
> >>>>>> In addition, we already have source/target in CouchDB replication,
> >>>>>> which people already use successfully, so making a similar thing
> that
> >>>>>> behaves slightly differently doesn’t sit quite right with me.
> >>>>>>
> >>>>>> I understand that I’m arguing to remove an ’s’ for very nitpicky
> >>>>>> but these are the kind of nitpick discussions we’ve done a lot in
> >>>>>> the early days which resulted in a by and large decent API that
> >>>>>> has served as well, and it’s something I’d like to see taken
> forward.
> >>>>>> Apologies if this all sounds very strict ;)
> >>>>>>
> >>>>>>
> >>>>> Thanks for the longer explanation. I understand now and agree, let's
> >>> make
> >>>>> it target. No worries about sounding nitpicky we should be nitpicky
> >>> about
> >>>>> APIs!
> >>>>>
> >>>>>
> >>>>>>>
> >>>>>>>>
> >>>>>>>> And just another question, sorry if I missed this elsewhere,
> would we
> >>>>>> ever
> >>>>>>>> consider adding to split/merge ratio different from 1:2, say 1:4,
> or
> >>>>>> will
> >>>>>>>> folks have to run 1:2, 1:2, 1:2 to get to the same result? I’m
> fine
> >>>>>> with
> >>>>>>>> either and if 1:2 fixed makes things simpler, I’m all for it ;)
> >>>>>>>>
> >>>>>>>>
> >>>>>>> Good point. Actually it's already implemented that way already :-)
> >>> Right
> >>>>>>> below the API surface it has a split=2 parameter and it just
> creates
> >>> the
> >>>>>>> targets based on that. It could be 2, 3, 4, ... 10 etc. However I
> was
> >>>>>>> thinking of keeping it hard coded at 2 at first to keep the
> behavior
> >>>>>>> simpler at first and open that parameter to be user facing in a
> later
> >>>>>>> release based on user feedback.
> >>>>>>
> >>>>>> Ace, again, fully on board with shipping 1:2 first and maybe
> offering
> >>>>>> other
> >>>>>> options later.
> >>>>>>
> >>>>>> Best
> >>>>>> Jan
> >>>>>> —
> >>>>>>
> >>>>>>>
> >>>>>>> Cheers,
> >>>>>>>
> >>>>>>> -Nick
> >>>>>>
> >>>>>>
> >>>
> >>>
> >
>
> --
> Professional Support for Apache CouchDB:
> https://neighbourhood.ie/couchdb-support/
>
>

Re: Shard Splitting API Proposal

Reply via email to