Aman Sinha created IMPALA-10723:
-----------------------------------
Summary: Allow querying and computing stats on a materialized view
Key: IMPALA-10723
URL: https://issues.apache.org/jira/browse/IMPALA-10723
Project: IMPALA
Issue Type: Improvement
Components: Frontend
Reporter: Aman Sinha
Assignee: Aman Sinha
Currently, in Impala, a Materialized View (MV) created via Hive is visible
through the metadata catalog and can be queried but the query is expanded to
its corresponding view definition. This is incorrect because a materialized
view is a regular (physical) table, not a view. Even though Impala does not
support either creating MV or automatic rewriting to use MV, querying an MV
directly should be allowed.
Here's the current behavior:
{noformat}
[localhost:21050] functional> explain select * from materialized_view;
Query: explain select * from materialized_view
+------------------------------------------------------------------------------------+
| Explain String
|
+------------------------------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=4.00MB Threads=2
|
| Per-Host Resource Estimates: Memory=10MB
|
| WARNING: The following tables are missing relevant table and/or column
statistics. |
| functional.insert_only_transactional_table
|
|
|
| PLAN-ROOT SINK
|
| |
|
| 01:EXCHANGE [UNPARTITIONED]
|
| |
|
| 00:SCAN HDFS [functional.insert_only_transactional_table]
|
| HDFS partitions=1/1 files=0 size=0B
|
| row-size=4B cardinality=0
|
+------------------------------------------------------------------------------------+
{noformat}
Note that the plan shows the scan of the underlying table instead of the
materialized_view table. We should only be scanning the MV (including applying
partition pruning, predicate pushdown etc.) and not treating this as a view.
This JIRA is to enhance the frontend to recognize a materialized view as a
table rather than a view. This will further allow commands such as COMPUTE
STATS, DROP STATS, SHOW [TABLE | COLUMN] STATS to be run on the MV. One
motivation for doing this is to allow an external frontend that supports
automatic query rewrites using MVs to leverage the statistics on MVs.
Since Impala is not creating the MV, we will need to block DML operations on
the MV. Further, special handling needs to be done for Ranger authorization
policies such that any column masking/row filtering policies defined on the
source tables of the MV are taken into consideration.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]