Hi Quanlong,

You're right. The catalog needs to handle metadata at a finer granularity.
We are actively looking into the options you mentioned as well as other
related changes (see IMPALA-3234 and IMPALA-3127) to improve the
performance and scalability of metadata management.

Thanks
Dimitris

On Mon, Sep 11, 2017 at 8:51 PM, Quanlong Huang <[email protected]>
wrote:

> Hi all,
>
>
> Currently if a "describe" statement hits an incomplete table, the impalad
> will send an RPC request to the catalogd for loading metadata of this
> table. It will take a long time for tables with many partitions and many
> files. However, to serve the "describe" statement, we just need the
> metadata in Hive MetaStore. In my experiments (with
> load_catalog_in_background=false), it take hours to describe a large
> table. This statement is pretty cheap in Hive or Presto. Users may worry
> about whether impala is set up correctly.
>
>
> Can we add a more fine grain strategy about loading the metadata? For
> queries just hit one partition of a huge table, we don't need to load all
> the file descriptors as well.  For example, more levels to trigger metadata
> load:
> Level1. Load metadata from Hive MetaStore
> Level2. Load file descriptors of given partitions
> Level3. Load all file descriptors
>
>
> Then we can serve the following scenario better:
> 1. describe a large table
> 2. run query on one or several partitions of this table. (Each partition
> has few files)
>
>
> Do we have some discussion about this before?
>
>
> Thanks
> Quanlong

Reply via email to