Re: [DISCUSS] Ideas to improve metadata cache read performance

Parth Chandra Fri, 30 Oct 2015 15:36:52 -0700

Thanks Steven for the link.
Your suggestion of storing only the single valued columns is a good one.
It might be OK to have some of the count* queries run a little slower as
reading the cache itself is taking way to long.  I'm also looking at
squashing the column datatype info as there is a lot of redundancy there.





On Fri, Oct 30, 2015 at 3:22 PM, Steven Phillips <[email protected]> wrote:

> My view on storing it in some other format is that, yes, it will probably
> reduce the size of the file, but if we gzip the json file, it should be
> pretty compact. As for deserialization cost, other formats would be faster,
> but not dramatically faster. Certainly not the order of magnitude faster
> that we really need it to be. The reason we chose JSON was because it is
> readable and easier to deal with.
>
> As for the old code, I can point you at a branch, but it's probably not
> very helpful. Unless we want to essentially disable value-based partition
> pruning when using the cache, the old code will not work.
>
> My recommendation would be to come up with a new version of the format
> which stores only the name and value of columns which are single-valued for
> each file or row group. This will allow partition pruning to work, but some
> count queries may not be as fast any more, because the cache won't have
> column value counts on a per-rowgroup basis any more.
>
> Anyway, here is the link to the original branch.
>
> https://github.com/StevenMPhillips/drill/tree/meta
>
> On Fri, Oct 30, 2015 at 3:01 PM, Parth Chandra <[email protected]> wrote:
>
> > Hey Jacques, Steven,
> >
> >   Do we have a branch somewhere which has the initial prototype code? I'd
> > like to prune the file a bit as it looks like reducing the size of the
> > metadata cache file might yield the best results.
> >
> >   Also, did we have a particular reason for going with JSON as opposed
> to a
> > more compact binary format? Are there any arguments against saving this
> as
> > a protobuf/BSON/Parquet file?
> >
> > Parth
> >
> > On Mon, Oct 26, 2015 at 2:42 PM, Jacques Nadeau <[email protected]>
> > wrote:
> >
> > > My first thought is we've gotten too generous in what we're storing in
> > the
> > > Parquet metadata file. Early implementations were very lean and it
> seems
> > > far larger today. For example, early implementations didn't keep
> > statistics
> > > and ignored row groups (files, schema and block locations only). If we
> > need
> > > multiple levels of information, we may want to stagger (or normalize)
> > them
> > > in the file. Also, we may think about what is the minimum that must be
> > done
> > > in planning. We could do the file pruning at execution time rather than
> > > single-tracking these things (makes stats harder though).
> > >
> > > I also think we should be cautious around jumping to a conclusion until
> > > DRILL-3973 provides more insight.
> > >
> > > In terms of caching, I'd be more inclined to rely on file system
> caching
> > > and make sure serialization/deserialization is as efficient as possible
> > as
> > > opposed to implementing an application-level cache. (We already have
> > enough
> > > problems managing memory without having to figure out when we should
> > drop a
> > > metadata cache :D).
> > >
> > > Aside, I always liked this post for entertainment and the thoughts on
> > > virtual memory: https://www.varnish-cache.org/trac/wiki/ArchitectNotes
> > >
> > >
> > > --
> > > Jacques Nadeau
> > > CTO and Co-Founder, Dremio
> > >
> > > On Mon, Oct 26, 2015 at 2:25 PM, Hanifi Gunes <[email protected]>
> > wrote:
> > >
> > > > One more thing, for workloads running queries over subsets of same
> > > parquet
> > > > files, we can consider maintaining an in-memory cache as well.
> Assuming
> > > > metadata memory footprint per file is low and parquet files are
> static,
> > > not
> > > > needing us to invalidate the cache often.
> > > >
> > > > H+
> > > >
> > > > On Mon, Oct 26, 2015 at 2:10 PM, Hanifi Gunes <[email protected]>
> > > wrote:
> > > >
> > > > > I am not familiar with the contents of metadata stored but if
> > > > > deserialization workload seems to be fitting to any of
> afterburner's
> > > > > claimed improvement points [1] It could well be worth trying given
> > the
> > > > > claimed gain on throughput is substantial.
> > > > >
> > > > > It could also be a good idea to partition caching over a number of
> > > files
> > > > > for better parallelization given number of cache files generated is
> > > > > *significantly* less than number of parquet files. Maintaining
> global
> > > > > statistics seems an improvement point too.
> > > > >
> > > > >
> > > > > -H+
> > > > >
> > > > > 1:
> > > > >
> > > >
> > >
> >
> https://github.com/FasterXML/jackson-module-afterburner#what-is-optimized
> > > > >
> > > > > On Sun, Oct 25, 2015 at 9:33 AM, Aman Sinha <[email protected]>
> > > > wrote:
> > > > >
> > > > >> Forgot to include the link for Jackson's AfterBurner module:
> > > > >>   https://github.com/FasterXML/jackson-module-afterburner
> > > > >>
> > > > >> On Sun, Oct 25, 2015 at 9:28 AM, Aman Sinha <[email protected]
> >
> > > > wrote:
> > > > >>
> > > > >> > I was going to file an enhancement JIRA but thought I will
> discuss
> > > > here
> > > > >> > first:
> > > > >> >
> > > > >> > The parquet metadata cache file is a JSON file that contains a
> > > subset
> > > > of
> > > > >> > the metadata extracted from the parquet files.  The cache file
> can
> > > get
> > > > >> > really large .. a few GBs for a few hundred thousand files.
> > > > >> > I have filed a separate JIRA: DRILL-3973 for profiling the
> various
> > > > >> aspects
> > > > >> > of planning including metadata operations.  In the meantime, the
> > > > >> timestamps
> > > > >> > in the drillbit.log output indicate a large chunk of time spent
> in
> > > > >> creating
> > > > >> > the drill table to begin with, which indicates bottleneck in
> > reading
> > > > the
> > > > >> > metadata.  (I can provide performance numbers later once we
> > confirm
> > > > >> through
> > > > >> > profiling).
> > > > >> >
> > > > >> > A few thoughts around improvements:
> > > > >> >  - The jackson deserialization of the JSON file is very slow..
> can
> > > > this
> > > > >> be
> > > > >> > speeded up ? .. for instance the AfterBurner module of jackson
> > > claims
> > > > to
> > > > >> > improve performance by 30-40% by avoiding the use of reflection.
> > > > >> >  - The cache file read is a single threaded process.  If we were
> > > > >> directly
> > > > >> > reading from parquet files, we use a default of 16 threads.
> What
> > > can
> > > > be
> > > > >> > done to parallelize the read ?
> > > > >> >  - Any operation that can be done one time during the REFRESH
> > > METADATA
> > > > >> > command ?  for instance..examining the min/max values to
> determine
> > > > >> > single-value for partition column could be eliminated if we do
> > this
> > > > >> > computation during REFRESH METADATA command and store the
> summary
> > > one
> > > > >> time.
> > > > >> >
> > > > >> >  - A pertinent question is: should the cache file be stored in a
> > > more
> > > > >> > efficient format such as Parquet instead of JSON ?
> > > > >> >
> > > > >> > Aman
> > > > >> >
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Ideas to improve metadata cache read performance

Reply via email to