[ 
https://issues.apache.org/jira/browse/IMPALA-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16489473#comment-16489473
 ] 

ASF subversion and git services commented on IMPALA-1480:
---------------------------------------------------------

Commit c98c01c55d7f6af7e536347986c5b22841bc78e7 in impala's branch 
refs/heads/2.x from [~csringhofer]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=c98c01c ]

IMPALA-6131: Track time of last statistics update in metadata

The timestamp of the last COMPUTE STATS operation is saved to
table property "impala.lastComputeStatsTime". The format is
the same as in "transient_lastDdlTime", so the two can be
compared to check if the schema has changed since computing
statistics.

Other changes:
- Handling of "transient_lastDdlTime" is simplified - the old
  logic set it to current time + 1, if the old version was
  >= current time, to ensure that it is always increased by
  DDL operations. This was useful in the past, as IMPALA-387
  used lastDdlTime to check if partition data needs to be
  reloaded, but since IMPALA-1480, Impala does not rely on
  lastDdlTime at all.

- Computing / setting stats on HDFS tables no longer increases
  "transient_lastDdlTime".

- When Kudu tables are (re)loaded, it is checked if their
  HMS representation is up to date, and if it is, then
  IMetaStoreClient.alter_table() is not called. The old
  logic always called alter_table() after loading metadata
  from Kudu. This change was needed to ensure that
  "transient_lastDdlTime" works similarly in HDFS and Kudu
  tables, and should also make (re)loading Kudu tables faster.

Notes:
- Kudu will be able to sync its tables to HMS in the near
  future (see KUDU-2191), so the Kudu metadata handling in
  Impala may need to be redesigned.

Testing:
tests/metadata/test_last_ddl_time_update.py is extended by
- also checking "impala.lastComputeStatsTime"
- testing more SQL statements
- tests for Kudu tables

Note that test_last_ddl_time_update.py is ran only in
exhaustive testing.

Change-Id: Ibda49725d3e76456f2d1b3edd1bf117b0174e234
Reviewed-on: http://gerrit.cloudera.org:8080/10484
Reviewed-by: Alex Behm <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Slow DDL statements for tables with large number of partitions
> --------------------------------------------------------------
>
>                 Key: IMPALA-1480
>                 URL: https://issues.apache.org/jira/browse/IMPALA-1480
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Catalog
>    Affects Versions: Impala 2.0
>            Reporter: Dimitris Tsirogiannis
>            Assignee: Dimitris Tsirogiannis
>            Priority: Critical
>              Labels: impala, performance
>             Fix For: Impala 2.5.0
>
>
> Impala users sometimes report that DDL statements (e.g. alter table partition 
> set location...) are taking multiple seconds (>5) for partitioned tables with 
> large number of partitions. The same operations are significantly faster in 
> hive (sub-second response time). 
> Use case:
> * 2 node cluster
> * Single table (24 columns, 3 partition keys) with 2500 partitions
> * alter table foo partition (foo_i = i) set location 'hdfs://.....' takes 
> approximately 5-6sec (0.2 in HIVE)
> * 1 sec delay in the alter stmt is caused by 
> https://issues.apache.org/jira/browse/HIVE-5524



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to