Hi Joan,

Thank you for taking a look!



> > * `GET /_shard_splits`
>
> As a result I'm concerned: would we then have duplicate endpoints
> for /_shard_merges? Or would a unified /_reshard endpoint make
> more sense here?
>

Good idea. Let's go with _reshard it's more general and allows for adding
shard merging later.


> I presume that if you've disabled shard manipulation on the
> cluster, the status changes to "disabled" and the value is the
> reason provided by the operator?
>
>
Currently it's PUT /_reshard/state and body {"state":"running":"stopped",
"reason":...}. This will  be shown at the top level in GET /_reshard/
response.


> > Get a summary of shard splitting for the whole cluster.
>
> What happens if every node in the cluster is restarted while a shard
> split operation is occurring? Is the job persisted somewhere, i.e. in
> special docs in _dbs, or would this kill the entire operation? I'm
> considering rolling cluster upgrades here.
>
>
The job will checkpoint as it goes through various steps that is saved in a
_local document in the shards dbs. So if a node is restarted, the job will
resume from the last checkpoint it stopped at


>
> > * `PUT /_shard_splits`
>
> Same comment as above about whether this is /_shard_splits or something
> that could expand to shard merging in the future as well.
>
> If you persist the state of the shard splitting operation when disabling,
> this could be used as a prerequisite to a rolling cluster upgrade
> (i.e., an important documentation update).
>
>
I think after discussing with other participants this became PUT
/_shard_splits/state (now PUT _reshard/state). The disable state is also
persisted on a per-node basis.

An interesting thing to think about, is if a node is down when shard
splitting is stopped or started, it won't find out about it. So I think we
might have to do some kind querying of neighboring nodes to detect if a new
node that just joined had missed a recent change to the global state.


> > * `POST /_shard_splits/jobs`
> >
> > Start a shard splitting job.
> >
> > Request body:
> >
> > {
> >     "node": "dbc...@db1.sandbox001.cloudant.net",
> >     "shard": "shards/00000000-FFFFFFFF/username/dbname.$timestamp"
> > }
>
>
> 1. Agree with earlier comments that having to specify this per-node is
> a nice to have, but really an end user wants to specify a *database*,
> and have the API create the q*n jobs needed. It would then return an
> array of jobs in the format you describe.
>

Ok, I think that's doable if we switch the response to be an array of
job_ids. Then we might also have to think about various failure modes, such
as what if the one of the nodes where a copy lives, is not up. Should that
be a failure or do we continue splitting just 2 copies.

>
> 2. Same comment as above; why not add a new field for "type":"split" or
> "merge" to make this expandable in the future?
>
> That makes sense, I can add a type field if we have _reshard as the top
level endpoint.


> -Joan
>

Reply via email to