>
> Kudu users are assured that their deployments can keep growing as
> Kudu continues to mature.


That's a good point. AFAICT we haven't seen many users complaining about
scalability in their clusters other than maybe slow start times. It's hard
to tell how near or far over the horizon these issues will be, but for now,
keeping an eye out for them should be enough to guide future
decision-making.

How about using the flag to define an explicit metadata directory,
> with the default being empty ("use the WAL directory")?


That sounds reasonable. Having the flag be a string also grants us a bit
more flexibility as the metadata story evolves in the future.

On Wed, Jan 10, 2018 at 11:57 AM, Adar Lieber-Dembo <[email protected]>
wrote:

> Thanks for starting a discussion on this topic.
>
> > There've been a few discussions recently regarding changes to Kudu's
> > metadata storage. There are a number of areas that could benefit from
> > improving this layer, and I've been coalescing some of these ideas to
> lock
> > down what changes make sense in the near future. Here's a list of a few
> > considerations:
> >
> > https://docs.google.com/document/d/1jXFqIZvLwkkmSjLC0wy-
> mDA0l0GmVuZJUMUB_7q5QUE/edit?usp=sharing
>
> I left a few comments in the gdoc.
>
> > Empirically, scalability is not as big of an adoption-bottleneck as it
> was
> > before. Additionally, it's not clear that the listed scalability issues
> are
> > the biggest bottlenecks to larger data volumes. Moving forward, we should
> > keep track of user stories that would benefit from such improvements.
>
> I don't really view scalability as an adoption bottleneck; I think
> it's more of a barrier to growing existing clusters. As such, I think
> it's important that we constantly iterate on scalability, so that
> every release is a little bit more scalable than the one before. That
> way Kudu users are assured that their deployments can keep growing as
> Kudu continues to mature.
>
> As far as specific bottlenecks are concerned, the spreading of LBM
> metadata across multiple files is a main contributor to our long
> startup times, and that's one of the biggest scalability bottlenecks
> AFAIK.
>
> > With these points in mind, it seems the reasonable path forward is go
> with
> > 1. and introduce a flag for users to colocate WALs and metadata.
>
> Makes sense. Will colocation be enabled by default for new clusters?
> How about using the flag to define an explicit metadata directory,
> with the default being empty ("use the WAL directory")? That'd make it
> similar to fs_data_dirs, which is nice. If the metadata directory
> can't be found in directory specified by this gflag (or in the WAL
> directory, if blank), we can fall back to looking in the first data
> directory.
>

Reply via email to