Hi Dimitris,

Thanks for your quick reply!


IMPALA-3127 is a great ticket. But it still has no progress and no assignee. Is 
it tracked in your internal Jira?


Hopes this can be done soon, since some users may choose Presto instead of 
Impala due to these usability cases.


Thanks
Quanlong

At 2017-09-12 12:17:23, "Dimitris Tsirogiannis" <[email protected]> 
wrote:
>Hi Quanlong,
>
>You're right. The catalog needs to handle metadata at a finer granularity.
>We are actively looking into the options you mentioned as well as other
>related changes (see IMPALA-3234 and IMPALA-3127) to improve the
>performance and scalability of metadata management.
>
>Thanks
>Dimitris
>
>On Mon, Sep 11, 2017 at 8:51 PM, Quanlong Huang <[email protected]>
>wrote:
>
>> Hi all,
>>
>>
>> Currently if a "describe" statement hits an incomplete table, the impalad
>> will send an RPC request to the catalogd for loading metadata of this
>> table. It will take a long time for tables with many partitions and many
>> files. However, to serve the "describe" statement, we just need the
>> metadata in Hive MetaStore. In my experiments (with
>> load_catalog_in_background=false), it take hours to describe a large
>> table. This statement is pretty cheap in Hive or Presto. Users may worry
>> about whether impala is set up correctly.
>>
>>
>> Can we add a more fine grain strategy about loading the metadata? For
>> queries just hit one partition of a huge table, we don't need to load all
>> the file descriptors as well.  For example, more levels to trigger metadata
>> load:
>> Level1. Load metadata from Hive MetaStore
>> Level2. Load file descriptors of given partitions
>> Level3. Load all file descriptors
>>
>>
>> Then we can serve the following scenario better:
>> 1. describe a large table
>> 2. run query on one or several partitions of this table. (Each partition
>> has few files)
>>
>>
>> Do we have some discussion about this before?
>>
>>
>> Thanks
>> Quanlong

Reply via email to