Thanks for the feedback Quanlong. We plan on addressing many of these
catalog issues in the immediate future.

Dimitris

On Mon, Sep 11, 2017 at 10:21 PM, Quanlong Huang <[email protected]>
wrote:

> Hi Dimitris,
>
> Thanks for your quick reply!
>
> IMPALA-3127 is a great ticket. But it still has no progress and no
> assignee. Is it tracked in your internal Jira?
>
> Hopes this can be done soon, since some users may choose Presto instead of
> Impala due to these usability cases.
>
> Thanks
> Quanlong
>
>
> At 2017-09-12 12:17:23, "Dimitris Tsirogiannis" <[email protected]> 
> wrote:
> >Hi Quanlong,
> >
> >You're right. The catalog needs to handle metadata at a finer granularity.
> >We are actively looking into the options you mentioned as well as other
> >related changes (see IMPALA-3234 and IMPALA-3127) to improve the
> >performance and scalability of metadata management.
> >
> >Thanks
> >Dimitris
> >
> >On Mon, Sep 11, 2017 at 8:51 PM, Quanlong Huang <[email protected]>
> >wrote:
> >
> >> Hi all,
> >>
> >>
> >> Currently if a "describe" statement hits an incomplete table, the impalad
> >> will send an RPC request to the catalogd for loading metadata of this
> >> table. It will take a long time for tables with many partitions and many
> >> files. However, to serve the "describe" statement, we just need the
> >> metadata in Hive MetaStore. In my experiments (with
> >> load_catalog_in_background=false), it take hours to describe a large
> >> table. This statement is pretty cheap in Hive or Presto. Users may worry
> >> about whether impala is set up correctly.
> >>
> >>
> >> Can we add a more fine grain strategy about loading the metadata? For
> >> queries just hit one partition of a huge table, we don't need to load all
> >> the file descriptors as well.  For example, more levels to trigger metadata
> >> load:
> >> Level1. Load metadata from Hive MetaStore
> >> Level2. Load file descriptors of given partitions
> >> Level3. Load all file descriptors
> >>
> >>
> >> Then we can serve the following scenario better:
> >> 1. describe a large table
> >> 2. run query on one or several partitions of this table. (Each partition
> >> has few files)
> >>
> >>
> >> Do we have some discussion about this before?
> >>
> >>
> >> Thanks
> >> Quanlong
>
>
>
>
>

Reply via email to