Thanks for the feedback Quanlong. We plan on addressing many of these catalog issues in the immediate future.
Dimitris On Mon, Sep 11, 2017 at 10:21 PM, Quanlong Huang <[email protected]> wrote: > Hi Dimitris, > > Thanks for your quick reply! > > IMPALA-3127 is a great ticket. But it still has no progress and no > assignee. Is it tracked in your internal Jira? > > Hopes this can be done soon, since some users may choose Presto instead of > Impala due to these usability cases. > > Thanks > Quanlong > > > At 2017-09-12 12:17:23, "Dimitris Tsirogiannis" <[email protected]> > wrote: > >Hi Quanlong, > > > >You're right. The catalog needs to handle metadata at a finer granularity. > >We are actively looking into the options you mentioned as well as other > >related changes (see IMPALA-3234 and IMPALA-3127) to improve the > >performance and scalability of metadata management. > > > >Thanks > >Dimitris > > > >On Mon, Sep 11, 2017 at 8:51 PM, Quanlong Huang <[email protected]> > >wrote: > > > >> Hi all, > >> > >> > >> Currently if a "describe" statement hits an incomplete table, the impalad > >> will send an RPC request to the catalogd for loading metadata of this > >> table. It will take a long time for tables with many partitions and many > >> files. However, to serve the "describe" statement, we just need the > >> metadata in Hive MetaStore. In my experiments (with > >> load_catalog_in_background=false), it take hours to describe a large > >> table. This statement is pretty cheap in Hive or Presto. Users may worry > >> about whether impala is set up correctly. > >> > >> > >> Can we add a more fine grain strategy about loading the metadata? For > >> queries just hit one partition of a huge table, we don't need to load all > >> the file descriptors as well. For example, more levels to trigger metadata > >> load: > >> Level1. Load metadata from Hive MetaStore > >> Level2. Load file descriptors of given partitions > >> Level3. Load all file descriptors > >> > >> > >> Then we can serve the following scenario better: > >> 1. describe a large table > >> 2. run query on one or several partitions of this table. (Each partition > >> has few files) > >> > >> > >> Do we have some discussion about this before? > >> > >> > >> Thanks > >> Quanlong > > > > >
