>
> One thing that running the draining on the local bookie doesn't cover,
> is that, if the bookie is down and unrecoverable, the bookie will
> never be drained, so the data on the bookie would remain
> underreplicated.


Perhaps this is a different case, and needs to be handled differently,
> but it could also be handled by a mechanism similar to the data
> integrity. There could be an auditor like process that scans all
> ledger metadata to find ledger segments where any of the bookies are
> missing ("draining" could be considered another missing state). When
> it finds a ledger with unreplicated data, it selects a bookie from the
> remaining bookies to take the data. This bookie is then marked as an
> extra replica in the metadata. From this point the mechanism is the
> data integrity mechanism. A bookie periodically checks if it has all
> the data it is supposed to have (taking the extra replica metadata
> into account), and copies anything that is missing.


The data integrity seems to be a replacement of replication workers, then
the auto recovery may just need the auditor to maintain the invariants of
the metadata of the cluster, and let bookies fetch data as needed directly?
It seems to be a great enhancement to reduce the complexity and the load
during auto recovery, I'm really looking forward to this feature!

Best regards,
Yang Yang


Ivan Kelly <iv...@apache.org> 于2021年9月8日周三 下午5:10写道:

> > I am not very familiar with bookkeeper and auditor history (so please let
> > me know if this understanding doesn't work), but it seems to me that the
> > process responsible for draining the bookie could be local to the bookie
> > itself to limit network hops.
>
> This is a very good point. One of the reasons we did the data
> integrity work was that if there were entries missing from a bookie,
> they would have to be copied to a replicator process and then copied
> to the destination. The data integrity checker (which I promise we
> will push upstream soon), runs on the bookie and only does one copy.
> From another node to the local node.
>
> One thing that running the draining on the local bookie doesn't cover,
> is that, if the bookie is down and unrecoverable, the bookie will
> never be drained, so the data on the bookie would remain
> underreplicated.
>
> Perhaps this is a different case, and needs to be handled differently,
> but it could also be handled by a mechanism similar to the data
> integrity. There could be an auditor like process that scans all
> ledger metadata to find ledger segments where any of the bookies are
> missing ("draining" could be considered another missing state). When
> it finds a ledger with unreplicated data, it selects a bookie from the
> remaining bookies to take the data. This bookie is then marked as an
> extra replica in the metadata. From this point the mechanism is the
> data integrity mechanism. A bookie periodically checks if it has all
> the data it is supposed to have (taking the extra replica metadata
> into account), and copies anything that is missing.
>
> > Do you see the bookkeeper tiered storage being used in every case?
> No, I doubt it. For us, the thinking is that we want to use tiered
> storage anyhow. So if it's there, we may as well use it for
> autoscaling, and not spend too many cycles on another mechanism.
>
> Another aspect of this is cost. Without tiered storage, the variable
> that decides the number of nodes you need is the
> (throughput)*(retention). So, assuming that throughput doesn't change
> much, or is cyclical, there'll be very few autoscaling decisions taken
> (nodes need to stick around for retention).
> If you use tiered storage, the number of nodes needed is purely based
> on throughput. You'll have fewer bookies, but autoscaling will need to
> respond more frequently to variations in throughput.
>
> -Ivan
>

Reply via email to