We have this use case too. I believe introducing "pools" Is the right
approach for this.
Pools are very high level abstraction and it is treated as simply two
different clusters, but wrapped into one.

Some of the high level thoughts:

* Pools are top level abstraction.
* Pool is assigned at the time of ledger creation.(based one some criteria
at client)
* Ensemble changes, replication happens only in that pool of bookies.
* Stats, Storage capacity is tracked at pool level.
* Capadd is to a particular pool.
* Each pool of bookies may run with different server configurations
* Client configuration should accommodate pools too, different
configuration values under different pools.

JV

On Tue, May 16, 2017 at 7:49 AM, Bobby Evans <ev...@yahoo-inc.com.invalid>
wrote:

> OK so I am not keen on the idea of labels.  Probably because when I have
> seen it done in the past (YARN) it just felt like a hack that was trying to
> avoid fixing the real underlying problem. YARN wanted to schedule for
> arbitrary resources but that is hard so they went with Node Labels
> instead.  Node labels have evolved in YARN and are now used for
> partitioning a cluster for isolation as well (although it really is because
> network scheduling/isolation is hard).
>
> Now that I am done with my YARN node label rant I want to add that HBase
> put in an option for isolating table groups from each other on different
> region servers that has worked really well for a multi-tenant setup, so I
> am not completely opposed to the idea I just want to be sure we do it right.
>
> In my opinion if this is a feature to isolate different groups from each
> other to avoid one bad actor impacting everyone else I would prefer to see
> something with quotas for clients and/or users and nodes reporting their
> capabilities + current usage instead.  If you want some kind of affinity
> because you bought hardware to handle longer term vs shorter term storage
> then I would prefer to see that called out explicitly when the ledger is
> created instead of having arbitrary labels.  That way a long lived ledger
> could be placed on a node with lots of free capacity and short lived
> ledgers can go anywhere.  A client could either set it when they create a
> ledger and have a default in the config if it is not specified.
>
> If we do go with labels I want to be sure that we stress that users should
> keep their matching rules as simple as possible.
> Hard partitioning of a cluster on labels provides a lot of possibility to
> shoot yourself in the foot and not notice it.
> They need to make sure that they have ways to easily monitor bookies
> grouped in the same way their client rules do.  They need to make sure that
> when doing a rolling upgrade that they take the client rules into account
> when deciding what to take out and upgrade to avoid making a group of
> clients completely unusable.
>
> - Bobby
>
> On Tuesday, May 16, 2017, 6:05:21 AM CDT, Enrico Olivelli <
> eolive...@gmail.com> wrote:Hi bookkeepers,
> I'm using BookKeeper for serveral projects, every project has its own
> workload characteristics and I would like to be able to assign bookies
> depending of the client type. It is quite common to share a BookKeeper
> cluster between different applications.
>
> For instance I am using Bookies to store Database logs, Task Brokers
> logs and recently I have started to use BookKeeper as data storage.
>
> Within the cluster I would like to use specific Bookies for mid-term
> storage, some bookies for logs...and so on, but current placement
> policies are not able to "distinguish" bookies.
>
> Actually I can achieve my goal by using a custom policy + custom
> metadata + out of band bookie metadata.
>
> I would like to introduce a first step, following the work of on
> "Resource aware data placement" (1), and introduce a list of "labels"
> to be assigned to every bookie.
>
> For instance: bookies for long term storage will have label
> "long-term", bookies for transaction logs may have label "wals".
>
> Another use case is to be able to request BookKeeper to write ledger
> data on specific sets of bookies depending on the "customer" who is
> the owner of data (I have customers already grouped by labels/tags)
>
> I would like to have a simple "standard" policy which uses some
> "standard" metadata to select bookies.
>
> Thinks to add:
> - a set  of "labels" configurable for bookies
> - Enrich the API (getBookieInfo) to query for labels and BookKeeper
> client to keep a local cache of label-to-bookie assignments
> - add a standard "custom metadata field"  which is a list of labels to
> use to select bookies, a bookie would be used only of it currently
> "has" all of the labels requested
>
>
> [1] https://cwiki.apache.org/confluence/display/BOOKKEEPER/
> BP-2+-+Resource+aware+data+placement
>
> All comments are welcome
>
> -- Enrico
>



-- 
Jvrao
---
First they ignore you, then they laugh at you, then they fight you, then
you win. - Mahatma Gandhi

Reply via email to