Thank for the response. 

Does that mean before v.2.12 if one uses load_catalog_in_background option to 
load the metadata (on demand) incrementally in background is unlikely to hit an 
an issue. 

Also I am guessing it has an impact on DDL performance because the catalog 
updates (say DDL statement or adding partitions) would take a bit more time 
with bigger metadata especially with IMPALA-5058 because there would be 
increased lock contention (for the getCatalogObjects) even if each one take 
only small amout of time slower ?



On 3/22/18, 9:14 PM, "Dimitris Tsirogiannis" <[email protected]> wrote:

    Hi Antoni,
    
    First of all, let me say that indeed that statement is kind of confusing
    and in some cases wrong.
    
    The best way to think of the 2GB Java limit is with respect to the amount
    of serialized metadata that the catalog server needs to send at any point
    in time. There are operations that require a single table to be serialized,
    so in that case the 2GB limit is "applied" at the table level. However,
    there are other operations, such as sending a catalog topic update to the
    statestore. In that case, the catalog may serialize multiple tables
    (depending on whether there was a change in metadata since the last update)
    into a single Thrift message. Hence, in this case the limit is applied to
    the cumulative size of serialized table metadata. That said, in Impala
    v2.12 we're changing the behavior so that the 2GB limit is always applied
    at a single table.
    
    Hope that helps.
    Dimitris
    
    On Thu, Mar 22, 2018 at 12:04 PM, Antoni Ivanov <[email protected]> wrote:
    
    > Hi,
    >
    > I was reading 
https://urldefense.proofpoint.com/v2/url?u=https-3A__www.cloudera.com_documentation_enterprise_5-2D11-2D&d=DwIFaQ&c=uilaK90D4TOVoH58JNXRgQ&r=8j7tFznmoySfY9RiEAZHgvi3kzcbJ_Zy6Hp9HYX4dDE&m=jtsSPRqJ13BcXd8BqA3RMj9x7Hni24nOtcz4yVMgG30&s=32fFHQs-e5H2KjE_P6RVlmapbPg9-Zk8OaI-m8stEv0&e=
    > x/topics/impala_partitioning.html#partition_stats
    > And noticed this warning "If this metadata for all tables combined exceeds
    > 2 GB, you might experience service downtime.”
    >
    > But I thought that this limit applies for single table. Having large
    > number of partitions per table can be an issue because each metadata
    > operation for this table would require the whole metadata be updated (as
    > far as I understand Impala doesn’t update partially the metadata for only
    > the changed partitions) Also because of Java serialization limitation 
where
    > you cannot serialize it to more than 1G or 2G
    > I am guessing this is related to 
https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_&d=DwIFaQ&c=uilaK90D4TOVoH58JNXRgQ&r=8j7tFznmoySfY9RiEAZHgvi3kzcbJ_Zy6Hp9HYX4dDE&m=jtsSPRqJ13BcXd8BqA3RMj9x7Hni24nOtcz4yVMgG30&s=MNcc4Uarowr_mGlWOJbZqfkM9yPZhe6NyxdZ6VUTwdI&e=
    > jira/browse/IMPALA-5058 - but that’s only if you do frequent DDL
    > operations I suppose.
    >
    > Am I understanding things so far correctly. So why can there be service
    > downtime in this case ?
    >
    > Thanks,
    > Antoni
    

Reply via email to