Hi Bernard, all
Thanks for the comments. Few answers inlined below.
Since a while I am experimenting with an older version,
which is being used in Pocket. I changed and extended it
and gained some more experience, what I believe would be
good to have. I also believe that adding such functionality
natively to Apache Crail (instead of "around" Crail as in Pocket)
would help making Crail a storage service choice in serverless
environments. Furthermore, having this mechanisms natively
in Crail does not harm running Crail the classical way.
More concretely, I suggest to add the following:
- Add mechanisms to gracefully leave datanodes
(with the namenode's help)
- Mechanism 1: Datanode leaves when no more blocks are allocated
(as in Pocket)
- Mechanism 2: Namenode helps to move blocks from the leaving
datanode to a remaining datanode. "helps" does not mean,
that the namenode has to perform the actual data copying,
but only to find new blocks and update the file block lists.
...but that data copying would have to be done somehow.
So the data node shall execute that? That would add
client code to the data node..?
Yes, this is at least one possible approach I had in mind.
I don't think that the full client code is necessary,
if the namenode gives a list of replacement block per
allocated block of the datanode that requested to leave,
the datanode would only need to perform the copying of
its block to the new one (for all blocks). So I believe it
would be a rather simple mechanism.
Since the datanode is leaving anyways, I believe it is ok
to do some additional work. I know, the design goal is
that datanodes do not perform any active action, however
in this special case it is probably best, if the leaving node
does perform the action, instead of giving this additional
work to the namenode or the new datanode.
But we can discuss that.
Would probably good if in a first step the simple
mechanism (empty data node leaves) becomes available?
Yes, definitely this will be the first step.
- Allow datanodes to express the wish to leave
(ask namenode to initiate the process), by sending a message to
the namenode.
Another way around would be to let the namenode tell
the data node to go off. I think you have good reasons
to do it the way you are proposing.
Yes, definitely it should be possible to tell the datanode to go off.
The other way would be in addition, a datanode should be able to
ask the namenode to leave.
There is a reason for this proposal: Thinking towards serverless,
one interesting way would be to run Crail as a Knative
service and let it scale by the Knative autoscaler. This way,
if it scales in, Knative would tell the POD using a hook that it will be
deleted soon. It gets a certain time to perform some cleanup actions,
before finally being removed from the cluster.
By implementing this hook, the datanode has a chance to tell
the namenode that it is about to leave and to copy the
remaining blocks to still running datanodes.
This additional mechanism is basically an "enabler",
but does not need to be used, especially not, if Crail
is used the regular way. So I don't think that just
having the possibility built-in would negatively
affect performance.
Thanks
Adrian