[ 
https://issues.apache.org/jira/browse/IMPALA-6536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17035804#comment-17035804
 ] 

Tim Armstrong commented on IMPALA-6536:
---------------------------------------

[~vihangk1][~stigahuang] this might be of interest - do you know if it's still 
a problem?

> CREATE TABLE on S3 takes a very long time
> -----------------------------------------
>
>                 Key: IMPALA-6536
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6536
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Catalog, Frontend
>    Affects Versions: Impala 2.9.0, Impala 2.10.0, Impala 2.11.0, Impala 3.0, 
> Impala 2.12.0
>            Reporter: Alexander Behm
>            Priority: Critical
>              Labels: catalog, perfomance, s3
>
> *Summary*
> Creating a table that points to existing data in S3 can take an excessive 
> amount of time.
> *Reason*
> If the Hive Metastore is configured with "hive.stats.autogather=true" then 
> Hive lists the files of newly created tables to populate basic statistics 
> like file count and file byte sizes. Unfortunately, this listing operation 
> can take an excessive amount of time particularly on S3.
> *Workaround*
> * Reconfigure the Hive Metastore with "hive.stats.autogather=false"
> * Note that TBLPROPERTIES("DO_NOT_UPDATE_STATS"="true") does not address the 
> issue due to a bug in Hive
> Related:
> https://issues.apache.org/jira/browse/HIVE-18743
> *Example*
> {code}
> CREATE EXTERNAL TABLE tpch_lineitem_s3 (
>   l_orderkey BIGINT,
>   l_partkey BIGINT,
>   l_suppkey BIGINT,
>   l_linenumber BIGINT,
>   l_quantity DECIMAL(12,2),
>   l_extendedprice DECIMAL(12,2),
>   l_discount DECIMAL(12,2),
>   l_tax DECIMAL(12,2),
>   l_returnflag STRING,
>   l_linestatus STRING,
>   l_shipdate STRING,
>   l_commitdate STRING,
>   l_receiptdate STRING,
>   l_shipinstruct STRING,
>   l_shipmode STRING,
>   l_comment STRING
> )
> STORED AS PARQUET
> LOCATION "s3a://some_location/my_existing_data"
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to