Re: Shard Splitting API Proposal

Jan Lehnardt Mon, 25 Mar 2019 02:56:59 -0700

Hi Nick,

despite https://twitter.com/janl/status/1108741392975777795 left one typo 
comment ;)


Great job everyone!

Jan
—


> On 21. Mar 2019, at 15:44, Jan Lehnardt <j...@apache.org> wrote:
> 
> Hi Nick,
> 
> On first glance, this looks all great and like an exemplary PR that is easy 
> to follow. And bonus props for the nice docs. I'll have more time for a 
> through review over the weekend.
> 
> Cheers
> Jan
> —
> 
>> On 18. Mar 2019, at 19:42, Nick Vatamaniuc <vatam...@gmail.com> wrote:
>> 
>> Hello everyone,
>> 
>> Thank you all (Joan, Jan, Mike, Ilya, Adam) who contributed to the API
>> discussion. There is now a PR open
>> https://github.com/apache/couchdb/pull/1972 . If you get a chance, I would
>> appreciate any reviews, feedback or comments.
>> 
>> The PR message explains how the commits are organized and references the
>> RFC. Basically it starts with preparatory work, ensuring all the existing
>> components know how to deal with split shards. Then, some lower level bits
>> are implemented, like bulk copy, internal replicator updates, etc.,
>> followed by the individual job implementation and the job manager which
>> stitches everything together. In the end is the HTTP API implementation
>> along with a suite of unit and Elixir integration tests.
>> 
>> There is also a README_reshard.md file in src/mem3 that tries to provide a
>> more in-depth technical description of how everything fits together.
>> https://github.com/apache/couchdb/pull/1972/files#diff-5ac7b51ec4e03e068bf271f34ecf88df
>> (notice
>> this URL might changer after a rebase).
>> 
>> Also special thanks to Paul (job module implementation, get_ring function,
>> a lot of architectural and implementation advice), Eric (finding many bugs,
>> fixes for the bugs, and writing bulk copy and change feed tests), and Jay
>> (testing and a thorough code review).
>> 
>> Cheers,
>> -Nick
>> 
>>> On Sun, Feb 17, 2019 at 2:32 AM Jan Lehnardt <m...@jan.io> wrote:
>>> 
>>> Heya Nick,
>>> 
>>> Nicely done. I think even though the majority of the discussion had
>>> already happened here, the RFC nicely pulled together the various
>>> discussion threads into a coherent whole.
>>> 
>>> I would imagine the discussion on GH would be similarly fruitful.
>>> 
>>> I gave it my +1, and as I said on the outset: I'm very excited about this
>>> feature!
>>> 
>>> Best
>>> Jan
>>> —
>>> 
>>>> On 15. Feb 2019, at 23:45, Nick Vatamaniuc <vatam...@gmail.com> wrote:
>>>> 
>>>> Decided to kick the tires on the new RFC proposal issue type and created
>>>> one for shard splitting:
>>>> 
>>>> https://github.com/apache/couchdb/issues/1920
>>>> 
>>>> Let's see how it goes. Being it's the first one let me know if I missed
>>>> anything obvious.
>>>> 
>>>> Also I'd like to thank everyone who contributed to the discussion. The
>>> API
>>>> is looking more solid and is much improved from where it started.
>>>> 
>>>> Cheers,
>>>> -Nick
>>>> 
>>>> 
>>>> 
>>>>> On Wed, Feb 13, 2019 at 12:03 PM Nick Vatamaniuc <vatam...@gmail.com>
>>> wrote:
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Wed, Feb 13, 2019 at 11:52 AM Jan Lehnardt <j...@apache.org> wrote:
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On 13. Feb 2019, at 17:12, Nick Vatamaniuc <vatam...@gmail.com>
>>> wrote:
>>>>>>> 
>>>>>>> Hi Jan,
>>>>>>> 
>>>>>>> Thanks for taking a look!
>>>>>>> 
>>>>>>>> On Wed, Feb 13, 2019 at 6:28 AM Jan Lehnardt <j...@apache.org> wrote:
>>>>>>>> 
>>>>>>>> Nick, this is great, I have a few tiny nits left, apologies I only
>>> now
>>>>>> got
>>>>>>>> to it.
>>>>>>>> 
>>>>>>>>> On 12. Feb 2019, at 18:08, Nick Vatamaniuc <vatam...@gmail.com>
>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> Shard Splitting API Proposal
>>>>>>>>> 
>>>>>>>>> I'd like thank everyone who contributed to the API discussion. As a
>>>>>>>> result
>>>>>>>>> we have a much better and consistent API that what we started with.
>>>>>>>>> 
>>>>>>>>> Before continuing I wanted to summarize to see what we ended up
>>> with.
>>>>>> The
>>>>>>>>> main changes since the initial proposal were switching to using
>>>>>> /_reshard
>>>>>>>>> as the main endpoint and having a detailed state transition history
>>>>>> for
>>>>>>>>> jobs.
>>>>>>>>> 
>>>>>>>>> * GET /_reshard
>>>>>>>>> 
>>>>>>>>> Top level summary. Besides the new _reshard endpoint, there `reason`
>>>>>> and
>>>>>>>>> the stats are more detailed.
>>>>>>>>> 
>>>>>>>>> Returns
>>>>>>>>> 
>>>>>>>>> {
>>>>>>>>> "completed": 3,
>>>>>>>>> "failed": 4,
>>>>>>>>> "running": 0,
>>>>>>>>> "state": "stopped",
>>>>>>>>> "state_reason": "Manual rebalancing",
>>>>>>>>> "stopped": 0,
>>>>>>>>> "total": 7
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> * PUT /_reshard/state
>>>>>>>>> 
>>>>>>>>> Start or stop global rebalacing.
>>>>>>>>> 
>>>>>>>>> Body
>>>>>>>>> 
>>>>>>>>> {
>>>>>>>>> "state": "stopped",
>>>>>>>>> "reason": "Manual rebalancing"
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> Returns
>>>>>>>>> 
>>>>>>>>> {
>>>>>>>>> "ok": true
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> * GET /_reshard/state
>>>>>>>>> 
>>>>>>>>> Return global resharding state and reason.
>>>>>>>>> 
>>>>>>>>> {
>>>>>>>>> "reason": "Manual rebalancing",
>>>>>>>>> "state": “stopped”
>>>>>>>>> }
>>>>>>>> 
>>>>>>>> More a note than a change request, but `state` is a very generic term
>>>>>> that
>>>>>>>> often confuses folks when they are new to something. If the set of
>>>>>> possible
>>>>>>>> states is `started` and `stopped`, how about making this endpoint a
>>>>>> boolean?
>>>>>>>> 
>>>>>>>> /_reshard/enabled
>>>>>>>> 
>>>>>>>> {
>>>>>>>> "enabled": true|false,
>>>>>>>> "reason": "Manual rebalancing"
>>>>>>>> }
>>>>>>>> 
>>>>>>>> 
>>>>>>> I thought of that as well. However _reshard/state seemed consistent
>>> with
>>>>>>> _reshard/jobs/$jobid/state. Setting "state":"stopped" _reshard/state
>>>>>> will
>>>>>>> lead to all individual running job state to become "stopped" as well.
>>>>>> And
>>>>>>> "running" will make jobs that are not individually stopped also become
>>>>>>> "running". In other words since it directly toggle job's state (with a
>>>>>> job
>>>>>>> being to override stopped state) I like that it had the same arguments
>>>>>> 
>>>>>> Got it, makes perfect sense.
>>>>>> 
>>>>>>> and": true|false
>>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> * GET /_reshard/jobs
>>>>>>>>> 
>>>>>>>>> Get the state of all the resharding jobs on the cluster. Now we
>>> have a
>>>>>>>>> detailed
>>>>>>>>> state transition history which looks similar what _scheduler/jobs
>>>>>> have.
>>>>>>>>> 
>>>>>>>>> {
>>>>>>>>> "jobs": [
>>>>>>>>>    {
>>>>>>>>>        "history": [
>>>>>>>>>            {
>>>>>>>>>                "detail": null,
>>>>>>>>>                "timestamp": "2019-02-06T22:28:06Z",
>>>>>>>>>                "type": "new"
>>>>>>>>>            },
>>>>>>>>>            ...
>>>>>>>>>            {
>>>>>>>>>                "detail": null,
>>>>>>>>>                "timestamp": "2019-02-06T22:28:10Z",
>>>>>>>>>                "type": "completed"
>>>>>>>>>            }
>>>>>>>>>        ],
>>>>>>>>>        "id":
>>>>>>>>> 
>>>>>> "001-0a308ef9f7bd24bd4887d6e619682a6d3bb3d0fd94625866c5216ec1167b4e23",
>>>>>>>>>        "job_state": "completed",
>>>>>>>>>        "node": "node1@127.0.0.1",
>>>>>>>>>        "source": "shards/00000000-ffffffff/db1.1549492084",
>>>>>>>>>        "split_state": "completed",
>>>>>>>>>        "start_time": "2019-02-06T22:28:06Z",
>>>>>>>>>        "state_info": {},
>>>>>>>>>        "targets": [
>>>>>>>>>            "shards/00000000-7fffffff/db1.1549492084",
>>>>>>>>>            "shards/80000000-ffffffff/db1.1549492084"
>>>>>>>>>        ],
>>>>>>>> 
>>>>>>>> Since we went from /_split to /_reshard to prepare for merging
>>> shards,
>>>>>> we
>>>>>>>> should reconsider source (singular) and targets (plural). Either a
>>>>>> merge
>>>>>>>> job (in the future) uses sources (plural) and target (singular) and
>>>>>> the job
>>>>>>>> schema is intentionally different, or we unify things to, maybe
>>>>>> singular:
>>>>>>>> source/target which would have the nice property of being analogous
>>> to
>>>>>> our
>>>>>>>> replication job schema. The type definition then is source:String and
>>>>>>>> target:Array(2) for split jobs and source:Array(2) target:String for
>>>>>>>> (future) merge jobs.
>>>>>>>> 
>>>>>>>> 
>>>>>>> Joan suggested adding a "type" field to both job creation POST body
>>> and
>>>>>>> also returning it when we inspect the job(s) state. So the
>>>>>> "type":"split"
>>>>>>> would toggle the schema. It could be "merge" in the future, or even
>>>>>>> something like "rebalance" where it would merge some and split others
>>>>>>> perhaps :-) and since we have a type it would be easier to
>>> differentiate
>>>>>>> between the merge and split jobs. But if there is a consensus from
>>>>>> others
>>>>>>> about switching targets to target that's easily as well.
>>>>>> 
>>>>>> Ah, I’m less concerned here about not being able to tell whether it’s a
>>>>>> split or a merge, and more about that having an indiscriminate plural
>>>>>> form (sourceS/targetS) depending on the type. It’s just an easy thing
>>> to
>>>>>> get wrong.
>>>>>> 
>>>>>> In addition, we already have source/target in CouchDB replication,
>>>>>> which people already use successfully, so making a similar thing that
>>>>>> behaves slightly differently doesn’t sit quite right with me.
>>>>>> 
>>>>>> I understand that I’m arguing to remove an ’s’ for very nitpicky
>>>>>> but these are the kind of nitpick discussions we’ve done a lot in
>>>>>> the early days which resulted in a by and large decent API that
>>>>>> has served as well, and it’s something I’d like to see taken forward.
>>>>>> Apologies if this all sounds very strict ;)
>>>>>> 
>>>>>> 
>>>>> Thanks for the longer explanation. I understand now and agree, let's
>>> make
>>>>> it target. No worries about sounding nitpicky we should be nitpicky
>>> about
>>>>> APIs!
>>>>> 
>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>> And just another question, sorry if I missed this elsewhere, would we
>>>>>> ever
>>>>>>>> consider adding to split/merge ratio different from 1:2, say 1:4, or
>>>>>> will
>>>>>>>> folks have to run 1:2, 1:2, 1:2 to get to the same result? I’m fine
>>>>>> with
>>>>>>>> either and if 1:2 fixed makes things simpler, I’m all for it ;)
>>>>>>>> 
>>>>>>>> 
>>>>>>> Good point. Actually it's already implemented that way already :-)
>>> Right
>>>>>>> below the API surface it has a split=2 parameter and it just creates
>>> the
>>>>>>> targets based on that. It could be 2, 3, 4, ... 10 etc. However I was
>>>>>>> thinking of keeping it hard coded at 2 at first to keep the behavior
>>>>>>> simpler at first and open that parameter to be user facing in a later
>>>>>>> release based on user feedback.
>>>>>> 
>>>>>> Ace, again, fully on board with shipping 1:2 first and maybe offering
>>>>>> other
>>>>>> options later.
>>>>>> 
>>>>>> Best
>>>>>> Jan
>>>>>> —
>>>>>> 
>>>>>>> 
>>>>>>> Cheers,
>>>>>>> 
>>>>>>> -Nick
>>>>>> 
>>>>>> 
>>> 
>>> 
> 

-- 
Professional Support for Apache CouchDB:
https://neighbourhood.ie/couchdb-support/

Re: Shard Splitting API Proposal

Reply via email to