On Tue, May 16, 2017 at 11:39 PM, Venkateswara Rao Jujjuri < jujj...@gmail.com> wrote:
> We have this use case too. I believe introducing "pools" Is the right > approach for this. > Pools are very high level abstraction and it is treated as simply two > different clusters, but wrapped into one. > > Some of the high level thoughts: > > * Pools are top level abstraction. > * Pool is assigned at the time of ledger creation.(based one some criteria > at client) > * Ensemble changes, replication happens only in that pool of bookies. > * Stats, Storage capacity is tracked at pool level. > * Capadd is to a particular pool. > * Each pool of bookies may run with different server configurations > * Client configuration should accommodate pools too, different > configuration values under different pools. > JV, can you come up with more details about the pool thing? Have you considered using pool for 128 bits ledger id? - Sijie > > JV > > On Tue, May 16, 2017 at 7:49 AM, Bobby Evans <ev...@yahoo-inc.com.invalid> > wrote: > > > OK so I am not keen on the idea of labels. Probably because when I have > > seen it done in the past (YARN) it just felt like a hack that was trying > to > > avoid fixing the real underlying problem. YARN wanted to schedule for > > arbitrary resources but that is hard so they went with Node Labels > > instead. Node labels have evolved in YARN and are now used for > > partitioning a cluster for isolation as well (although it really is > because > > network scheduling/isolation is hard). > > > > Now that I am done with my YARN node label rant I want to add that HBase > > put in an option for isolating table groups from each other on different > > region servers that has worked really well for a multi-tenant setup, so I > > am not completely opposed to the idea I just want to be sure we do it > right. > > > > In my opinion if this is a feature to isolate different groups from each > > other to avoid one bad actor impacting everyone else I would prefer to > see > > something with quotas for clients and/or users and nodes reporting their > > capabilities + current usage instead. If you want some kind of affinity > > because you bought hardware to handle longer term vs shorter term storage > > then I would prefer to see that called out explicitly when the ledger is > > created instead of having arbitrary labels. That way a long lived ledger > > could be placed on a node with lots of free capacity and short lived > > ledgers can go anywhere. A client could either set it when they create a > > ledger and have a default in the config if it is not specified. > > > > If we do go with labels I want to be sure that we stress that users > should > > keep their matching rules as simple as possible. > > Hard partitioning of a cluster on labels provides a lot of possibility to > > shoot yourself in the foot and not notice it. > > They need to make sure that they have ways to easily monitor bookies > > grouped in the same way their client rules do. They need to make sure > that > > when doing a rolling upgrade that they take the client rules into account > > when deciding what to take out and upgrade to avoid making a group of > > clients completely unusable. > > > > - Bobby > > > > On Tuesday, May 16, 2017, 6:05:21 AM CDT, Enrico Olivelli < > > eolive...@gmail.com> wrote:Hi bookkeepers, > > I'm using BookKeeper for serveral projects, every project has its own > > workload characteristics and I would like to be able to assign bookies > > depending of the client type. It is quite common to share a BookKeeper > > cluster between different applications. > > > > For instance I am using Bookies to store Database logs, Task Brokers > > logs and recently I have started to use BookKeeper as data storage. > > > > Within the cluster I would like to use specific Bookies for mid-term > > storage, some bookies for logs...and so on, but current placement > > policies are not able to "distinguish" bookies. > > > > Actually I can achieve my goal by using a custom policy + custom > > metadata + out of band bookie metadata. > > > > I would like to introduce a first step, following the work of on > > "Resource aware data placement" (1), and introduce a list of "labels" > > to be assigned to every bookie. > > > > For instance: bookies for long term storage will have label > > "long-term", bookies for transaction logs may have label "wals". > > > > Another use case is to be able to request BookKeeper to write ledger > > data on specific sets of bookies depending on the "customer" who is > > the owner of data (I have customers already grouped by labels/tags) > > > > I would like to have a simple "standard" policy which uses some > > "standard" metadata to select bookies. > > > > Thinks to add: > > - a set of "labels" configurable for bookies > > - Enrich the API (getBookieInfo) to query for labels and BookKeeper > > client to keep a local cache of label-to-bookie assignments > > - add a standard "custom metadata field" which is a list of labels to > > use to select bookies, a bookie would be used only of it currently > > "has" all of the labels requested > > > > > > [1] https://cwiki.apache.org/confluence/display/BOOKKEEPER/ > > BP-2+-+Resource+aware+data+placement > > > > All comments are welcome > > > > -- Enrico > > > > > > -- > Jvrao > --- > First they ignore you, then they laugh at you, then they fight you, then > you win. - Mahatma Gandhi >