Re: [PR] DRILL-8543: Add Support for Materialized Views (drill)

via GitHub Thu, 05 Feb 2026 22:05:21 -0800


letian-jiang commented on code in PR #3036:
URL: https://github.com/apache/drill/pull/3036#discussion_r2772468931



##########
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java:
##########
@@ -351,6 +354,253 @@ public void dropView(String viewName) throws IOException {
       getFS().delete(getViewPath(viewName), false);
     }
 
+    private Path getMaterializedViewPath(String name) {
+      return DotDrillType.MATERIALIZED_VIEW.getPath(config.getLocation(), 
name);
+    }
+
+    private Path getMaterializedViewDataPath(String name) {
+      // Use _mv_data suffix to distinguish data directory from MV definition 
lookup
+      return new Path(config.getLocation(), name + "_mv_data");
+    }
+
+    @Override
+    public boolean createMaterializedView(MaterializedView materializedView) 
throws IOException {
+      String viewName = materializedView.getName();
+      Path viewPath = getMaterializedViewPath(viewName);
+      Path dataPath = getMaterializedViewDataPath(viewName);
+
+      boolean replaced = getFS().exists(viewPath);
+
+      // If replacing, first drop the old data
+      if (replaced) {
+        if (getFS().exists(dataPath)) {
+          getFS().delete(dataPath, true);
+        }
+      }
+
+      // Create the data directory for the materialized view
+      final FsPermission dirPerms = new FsPermission(
+          
schemaConfig.getOption(ExecConstants.NEW_VIEW_DEFAULT_PERMS_KEY).string_val);
+      getFS().mkdirs(dataPath, dirPerms);
+
+      // Set the data storage path in the materialized view
+      materializedView.setDataStoragePath(viewName);
+
+      // Write the materialized view definition file
+      final FsPermission viewPerms = new FsPermission(
+          
schemaConfig.getOption(ExecConstants.NEW_VIEW_DEFAULT_PERMS_KEY).string_val);
+      try (OutputStream stream = DrillFileSystem.create(getFS(), viewPath, 
viewPerms)) {
+        mapper.writeValue(stream, materializedView);
+      }
+
+      // Sync to metastore if enabled
+      syncMaterializedViewToMetastore(materializedView);
+
+      // Mark as complete (data will be populated by the handler via CTAS-like 
operation)
+      return replaced;
+    }
+
+    @Override
+    public void dropMaterializedView(String viewName) throws IOException {
+      Path viewPath = getMaterializedViewPath(viewName);
+      Path dataPath = getMaterializedViewDataPath(viewName);
+
+      // Delete the definition file
+      if (getFS().exists(viewPath)) {
+        getFS().delete(viewPath, false);
+      }
+
+      // Delete the data directory
+      if (getFS().exists(dataPath)) {
+        getFS().delete(dataPath, true);
+      }
+
+      // Remove from metastore if enabled
+      removeMaterializedViewFromMetastore(viewName);
+    }
+
+    @Override
+    public void refreshMaterializedView(String viewName) throws IOException {
+      // Read the existing materialized view definition
+      MaterializedView mv = getMaterializedView(viewName);
+      if (mv == null) {
+        throw UserException.validationError()
+            .message("Materialized view [%s] not found in schema [%s]", 
viewName, getFullSchemaName())
+            .build(logger);
+      }
+
+      Path dataPath = getMaterializedViewDataPath(viewName);
+
+      // Delete existing data
+      if (getFS().exists(dataPath)) {
+        getFS().delete(dataPath, true);
+      }
+
+      // Recreate the data directory
+      final FsPermission dirPerms = new FsPermission(
+          
schemaConfig.getOption(ExecConstants.NEW_VIEW_DEFAULT_PERMS_KEY).string_val);
+      getFS().mkdirs(dataPath, dirPerms);
+
+      // Update the materialized view with new refresh time
+      MaterializedView updatedMV = mv.withRefreshInfo(

Review Comment:
   mark `COMPLETE` and set `lastRefreshTime` after data files are fully 
refreshed? 



##########
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/MaterializedViewHandler.java:
##########
@@ -0,0 +1,312 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.planner.sql.handlers;
+
+import java.io.IOException;
+
+import org.apache.calcite.plan.RelTraitSet;
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.type.RelDataType;
+import org.apache.calcite.schema.Schema;
+import org.apache.calcite.schema.SchemaPlus;
+import org.apache.calcite.schema.Table;
+import org.apache.calcite.sql.SqlNode;
+import org.apache.calcite.tools.RelConversionException;
+import org.apache.calcite.tools.ValidationException;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.util.DrillStringUtils;
+import org.apache.drill.exec.dotdrill.MaterializedView;
+import org.apache.drill.exec.ops.QueryContext;
+import org.apache.drill.exec.physical.PhysicalPlan;
+import org.apache.drill.exec.physical.base.PhysicalOperator;
+import org.apache.drill.exec.planner.logical.CreateTableEntry;
+import org.apache.drill.exec.planner.logical.DrillRel;
+import org.apache.drill.exec.planner.logical.DrillScreenRel;
+import org.apache.drill.exec.planner.logical.DrillWriterRel;
+import org.apache.drill.exec.planner.physical.Prel;
+import org.apache.drill.exec.planner.sql.DirectPlan;
+import org.apache.drill.exec.planner.sql.SchemaUtilities;
+import org.apache.drill.exec.planner.sql.parser.SqlCreateMaterializedView;
+import org.apache.drill.exec.planner.sql.parser.SqlCreateType;
+import org.apache.drill.exec.planner.sql.parser.SqlDropMaterializedView;
+import org.apache.drill.exec.planner.sql.parser.SqlRefreshMaterializedView;
+import org.apache.drill.exec.store.AbstractSchema;
+import org.apache.drill.exec.work.foreman.ForemanSetupException;
+import org.apache.drill.exec.work.foreman.SqlUnsupportedException;
+
+/**
+ * Handlers for materialized view DDL commands: CREATE, DROP, and REFRESH 
MATERIALIZED VIEW.
+ * <p>
+ * CREATE and DROP return DirectPlan with ok/summary output.
+ * REFRESH executes the MV query and writes data to Parquet, returning write 
statistics.
+ */
+public abstract class MaterializedViewHandler extends DefaultSqlHandler {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(MaterializedViewHandler.class);
+
+  protected QueryContext context;
+
+  public MaterializedViewHandler(SqlHandlerConfig config) {
+    super(config);
+    this.context = config.getContext();
+  }
+
+  /**
+   * Handler for CREATE MATERIALIZED VIEW DDL command.
+   * <p>
+   * Creates the MV definition file. The data will be materialized on first 
query
+   * or can be explicitly populated via REFRESH MATERIALIZED VIEW.
+   */
+  public static class CreateMaterializedView extends MaterializedViewHandler {
+
+    public CreateMaterializedView(SqlHandlerConfig config) {
+      super(config);
+    }
+
+    @Override
+    public PhysicalPlan getPlan(SqlNode sqlNode) throws ValidationException, 
RelConversionException,
+        IOException, ForemanSetupException {
+      SqlCreateMaterializedView createMV = unwrap(sqlNode, 
SqlCreateMaterializedView.class);
+
+      final String newViewName = 
DrillStringUtils.removeLeadingSlash(createMV.getName());
+
+      // Disallow temporary tables usage in materialized view definition
+      config.getConverter().disallowTemporaryTables();
+
+      // Store the SQL as the view definition
+      final String viewSql = createMV.getQuery().toSqlString(null, 
true).getSql();
+      final ConvertedRelNode convertedRelNode = 
validateAndConvert(createMV.getQuery());
+      final RelDataType validatedRowType = 
convertedRelNode.getValidatedRowType();
+      final RelNode queryRelNode = convertedRelNode.getConvertedNode();
+
+      final RelNode newViewRelNode = SqlHandlerUtil.resolveNewTableRel(true, 
createMV.getFieldNames(),
+          validatedRowType, queryRelNode);
+
+      final SchemaPlus defaultSchema = context.getNewDefaultSchema();
+      final AbstractSchema drillSchema = 
SchemaUtilities.resolveToMutableDrillSchema(defaultSchema,
+          createMV.getSchemaPath());
+
+      final String schemaPath = drillSchema.getFullSchemaName();
+
+      // Check view creation possibility
+      if (!checkMaterializedViewCreationPossibility(drillSchema, createMV, 
context)) {
+        return DirectPlan.createDirectPlan(context, false,
+            String.format("A table or view with given name [%s] already exists 
in schema [%s]",
+                newViewName, schemaPath));
+      }
+
+      // Create the materialized view definition
+      // Use the actual schema path where the MV is created (not the session's 
default schema)
+      final MaterializedView materializedView = new 
MaterializedView(newViewName, viewSql,
+          newViewRelNode.getRowType(), drillSchema.getSchemaPath());
+
+      // Create the materialized view definition file
+      final boolean replaced = 
drillSchema.createMaterializedView(materializedView);
+
+      String message = replaced
+          ? String.format("Materialized view '%s' replaced successfully in 
'%s' schema", newViewName, schemaPath)
+          : String.format("Materialized view '%s' created successfully in '%s' 
schema", newViewName, schemaPath);
+
+      logger.info("Created materialized view [{}] in schema [{}]", 
newViewName, schemaPath);
+      return DirectPlan.createDirectPlan(context, true, message);
+    }
+
+    /**
+     * Validates if materialized view can be created in indicated schema.
+     */
+    private boolean checkMaterializedViewCreationPossibility(AbstractSchema 
drillSchema,
+                                                              
SqlCreateMaterializedView createMV,
+                                                              QueryContext 
context) {
+      final String schemaPath = drillSchema.getFullSchemaName();
+      final String viewName = createMV.getName();
+      final Table table = SqlHandlerUtil.getTableFromSchema(drillSchema, 
viewName);
+
+      // Check if it's a materialized view
+      final boolean isMaterializedView = table != null &&
+          table.getJdbcTableType() == Schema.TableType.MATERIALIZED_VIEW;
+      final boolean isView = (table != null && table.getJdbcTableType() == 
Schema.TableType.VIEW);
+      // Regular table check excludes views and materialized views
+      final boolean isTable = (table != null
+          && table.getJdbcTableType() != Schema.TableType.VIEW
+          && table.getJdbcTableType() != Schema.TableType.MATERIALIZED_VIEW)
+          || context.getSession().isTemporaryTable(drillSchema, 
context.getConfig(), viewName);
+
+      SqlCreateType createType = createMV.getSqlCreateType();
+      switch (createType) {
+        case SIMPLE:
+          if (isTable) {
+            throw UserException.validationError()
+                .message("A non-view table with given name [%s] already exists 
in schema [%s]",
+                    viewName, schemaPath)
+                .build(logger);
+          } else if (isView) {
+            throw UserException.validationError()
+                .message("A view with given name [%s] already exists in schema 
[%s]", viewName, schemaPath)
+                .build(logger);
+          } else if (isMaterializedView) {
+            throw UserException.validationError()
+                .message("A materialized view with given name [%s] already 
exists in schema [%s]",
+                    viewName, schemaPath)
+                .build(logger);
+          }
+          break;
+        case OR_REPLACE:
+          if (isTable) {
+            throw UserException.validationError()
+                .message("A non-view table with given name [%s] already exists 
in schema [%s]",
+                    viewName, schemaPath)
+                .build(logger);
+          } else if (isView) {
+            throw UserException.validationError()
+                .message("A regular view with given name [%s] already exists 
in schema [%s]. " +
+                    "Cannot replace a regular view with a materialized view.", 
viewName, schemaPath)
+                .build(logger);
+          }
+          // Allow replacing existing materialized view
+          break;
+        case IF_NOT_EXISTS:
+          if (isTable || isView || isMaterializedView) {
+            return false;
+          }
+          break;
+      }
+      return true;
+    }
+  }
+
+  /**
+   * Handler for DROP MATERIALIZED VIEW DDL command.
+   */
+  public static class DropMaterializedView extends MaterializedViewHandler {
+
+    public DropMaterializedView(SqlHandlerConfig config) {
+      super(config);
+    }
+
+    @Override
+    public PhysicalPlan getPlan(SqlNode sqlNode) throws IOException, 
ForemanSetupException {
+      SqlDropMaterializedView dropMV = unwrap(sqlNode, 
SqlDropMaterializedView.class);
+      final String viewName = 
DrillStringUtils.removeLeadingSlash(dropMV.getName());
+      final AbstractSchema drillSchema = 
SchemaUtilities.resolveToMutableDrillSchema(
+          context.getNewDefaultSchema(), dropMV.getSchemaPath());
+
+      final String schemaPath = drillSchema.getFullSchemaName();
+
+      final Table viewToDrop = SqlHandlerUtil.getTableFromSchema(drillSchema, 
viewName);

Review Comment:
   better to explicitly check if this is a MV?  



##########
docs/dev/MaterializedViews.md:
##########
@@ -0,0 +1,388 @@
+# Materialized Views
+
+Materialized views in Apache Drill provide a mechanism to store pre-computed 
query results for improved query performance. Unlike regular views which are 
virtual and execute the underlying query each time they are accessed, 
materialized views persist the query results as physical data that can be 
queried directly.
+
+## Overview
+
+Materialized views are useful for:
+- Accelerating frequently executed queries with complex aggregations or joins
+- Reducing compute resources for repetitive analytical workloads
+- Providing consistent snapshots of data at a point in time
+
+Drill's materialized view implementation includes:
+- SQL syntax for creating, dropping, and refreshing materialized views
+- Automatic query rewriting using Calcite's SubstitutionVisitor
+- Integration with Drill Metastore for centralized metadata management
+- Parquet-based data storage for efficient columnar access
+
+## SQL Syntax
+
+### CREATE MATERIALIZED VIEW
+
+```sql
+CREATE MATERIALIZED VIEW [schema.]view_name AS select_statement
+CREATE OR REPLACE MATERIALIZED VIEW [schema.]view_name AS select_statement
+CREATE MATERIALIZED VIEW IF NOT EXISTS [schema.]view_name AS select_statement
+```
+
+Examples:
+
+```sql
+-- Create a materialized view with aggregations
+CREATE MATERIALIZED VIEW dfs.tmp.sales_summary AS
+SELECT region, product_category, SUM(amount) as total_sales, COUNT(*) as 
num_transactions
+FROM dfs.`/data/sales`
+GROUP BY region, product_category;
+
+-- Create or replace an existing materialized view
+CREATE OR REPLACE MATERIALIZED VIEW dfs.tmp.customer_stats AS
+SELECT customer_id, COUNT(*) as order_count, AVG(order_total) as avg_order
+FROM dfs.`/data/orders`
+GROUP BY customer_id;
+
+-- Create only if it doesn't exist
+CREATE MATERIALIZED VIEW IF NOT EXISTS dfs.tmp.daily_metrics AS
+SELECT date_col, SUM(value) as daily_total
+FROM dfs.`/data/metrics`
+GROUP BY date_col;
+```
+
+### DROP MATERIALIZED VIEW
+
+```sql
+DROP MATERIALIZED VIEW [schema.]view_name
+DROP MATERIALIZED VIEW IF EXISTS [schema.]view_name
+```
+
+Examples:
+
+```sql
+-- Drop a materialized view (error if not exists)
+DROP MATERIALIZED VIEW dfs.tmp.sales_summary;
+
+-- Drop only if it exists (no error if not exists)
+DROP MATERIALIZED VIEW IF EXISTS dfs.tmp.old_view;
+```
+
+### REFRESH MATERIALIZED VIEW
+
+```sql
+REFRESH MATERIALIZED VIEW [schema.]view_name
+```
+
+The REFRESH command re-executes the underlying query and replaces the stored 
data with fresh results.
+
+Example:
+
+```sql
+-- Refresh the materialized view with current data
+REFRESH MATERIALIZED VIEW dfs.tmp.sales_summary;
+```
+
+## Query Rewriting
+
+Drill supports automatic query rewriting where queries against base tables can 
be transparently rewritten to use materialized views when appropriate. This 
feature leverages Apache Calcite's SubstitutionVisitor for structural query 
matching.
+
+### Enabling Query Rewriting
+
+Query rewriting is controlled by the 
`planner.enable_materialized_view_rewrite` option:
+
+```sql
+-- Enable materialized view rewriting (enabled by default)
+SET `planner.enable_materialized_view_rewrite` = true;
+
+-- Disable materialized view rewriting
+SET `planner.enable_materialized_view_rewrite` = false;
+```
+
+### How Rewriting Works
+
+When query rewriting is enabled, Drill's query planner:
+
+1. Discovers all available materialized views in accessible schemas
+2. Filters candidates to those with COMPLETE refresh status
+3. For each candidate, parses the MV's defining SQL and converts it to a 
relational expression
+4. Uses Calcite's SubstitutionVisitor to check if the MV's query structure 
matches part or all of the user's query
+5. If a match is found, substitutes the matching portion with a scan of the 
materialized view data
+6. Selects the rewritten plan if it offers better performance characteristics
+
+### Rewriting Scenarios
+
+Query rewriting can apply in several scenarios:
+
+**Exact Match**: The user's query exactly matches the MV definition.
+
+```sql
+-- MV definition
+CREATE MATERIALIZED VIEW dfs.tmp.region_totals AS
+SELECT r_regionkey, COUNT(*) as cnt FROM cp.`region.json` GROUP BY r_regionkey;
+
+-- This query will use the MV
+SELECT r_regionkey, COUNT(*) as cnt FROM cp.`region.json` GROUP BY r_regionkey;
+```
+
+**Partial Match with Additional Filters**: The user's query adds filters on 
top of the MV.
+
+```sql
+-- This query may use the MV and apply the filter
+SELECT r_regionkey, cnt FROM dfs.tmp.region_totals WHERE cnt > 10;
+```
+
+**Aggregate Rollup**: Higher-level aggregations computed from MV aggregates.
+
+### Viewing the Execution Plan
+
+Use EXPLAIN to see if a materialized view is being used:
+
+```sql
+EXPLAIN PLAN FOR
+SELECT r_regionkey, COUNT(*) FROM cp.`region.json` GROUP BY r_regionkey;
+```
+
+If the MV is used, the plan will show a scan of the materialized view data 
location rather than the original table.
+
+## Storage Architecture
+
+### Definition Storage
+
+Materialized view definitions are stored as JSON files with the 
`.materialized_view.drill` extension in the workspace directory. This follows 
the same pattern as regular Drill views (`.view.drill` files).
+
+The definition file contains:
+- View name
+- Defining SQL statement
+- Field names and types
+- Workspace schema path
+- Data storage path
+- Last refresh timestamp
+- Refresh status (PENDING or COMPLETE)

Review Comment:
   I am confused about the status. I see `INCOMPLETE` in the code but it may be 
inequivalent to `PENDING`. 



##########
docs/dev/MaterializedViews.md:
##########
@@ -0,0 +1,388 @@
+# Materialized Views
+
+Materialized views in Apache Drill provide a mechanism to store pre-computed 
query results for improved query performance. Unlike regular views which are 
virtual and execute the underlying query each time they are accessed, 
materialized views persist the query results as physical data that can be 
queried directly.
+
+## Overview
+
+Materialized views are useful for:
+- Accelerating frequently executed queries with complex aggregations or joins
+- Reducing compute resources for repetitive analytical workloads
+- Providing consistent snapshots of data at a point in time
+
+Drill's materialized view implementation includes:
+- SQL syntax for creating, dropping, and refreshing materialized views
+- Automatic query rewriting using Calcite's SubstitutionVisitor
+- Integration with Drill Metastore for centralized metadata management
+- Parquet-based data storage for efficient columnar access
+
+## SQL Syntax
+
+### CREATE MATERIALIZED VIEW
+
+```sql
+CREATE MATERIALIZED VIEW [schema.]view_name AS select_statement
+CREATE OR REPLACE MATERIALIZED VIEW [schema.]view_name AS select_statement
+CREATE MATERIALIZED VIEW IF NOT EXISTS [schema.]view_name AS select_statement
+```
+
+Examples:
+
+```sql
+-- Create a materialized view with aggregations
+CREATE MATERIALIZED VIEW dfs.tmp.sales_summary AS
+SELECT region, product_category, SUM(amount) as total_sales, COUNT(*) as 
num_transactions
+FROM dfs.`/data/sales`
+GROUP BY region, product_category;
+
+-- Create or replace an existing materialized view
+CREATE OR REPLACE MATERIALIZED VIEW dfs.tmp.customer_stats AS
+SELECT customer_id, COUNT(*) as order_count, AVG(order_total) as avg_order
+FROM dfs.`/data/orders`
+GROUP BY customer_id;
+
+-- Create only if it doesn't exist
+CREATE MATERIALIZED VIEW IF NOT EXISTS dfs.tmp.daily_metrics AS
+SELECT date_col, SUM(value) as daily_total
+FROM dfs.`/data/metrics`
+GROUP BY date_col;
+```
+
+### DROP MATERIALIZED VIEW
+
+```sql
+DROP MATERIALIZED VIEW [schema.]view_name
+DROP MATERIALIZED VIEW IF EXISTS [schema.]view_name
+```
+
+Examples:
+
+```sql
+-- Drop a materialized view (error if not exists)
+DROP MATERIALIZED VIEW dfs.tmp.sales_summary;
+
+-- Drop only if it exists (no error if not exists)
+DROP MATERIALIZED VIEW IF EXISTS dfs.tmp.old_view;
+```
+
+### REFRESH MATERIALIZED VIEW
+
+```sql
+REFRESH MATERIALIZED VIEW [schema.]view_name
+```
+
+The REFRESH command re-executes the underlying query and replaces the stored 
data with fresh results.
+
+Example:
+
+```sql
+-- Refresh the materialized view with current data
+REFRESH MATERIALIZED VIEW dfs.tmp.sales_summary;
+```
+
+## Query Rewriting
+
+Drill supports automatic query rewriting where queries against base tables can 
be transparently rewritten to use materialized views when appropriate. This 
feature leverages Apache Calcite's SubstitutionVisitor for structural query 
matching.
+
+### Enabling Query Rewriting
+
+Query rewriting is controlled by the 
`planner.enable_materialized_view_rewrite` option:
+
+```sql
+-- Enable materialized view rewriting (enabled by default)
+SET `planner.enable_materialized_view_rewrite` = true;
+
+-- Disable materialized view rewriting
+SET `planner.enable_materialized_view_rewrite` = false;
+```
+
+### How Rewriting Works
+
+When query rewriting is enabled, Drill's query planner:
+
+1. Discovers all available materialized views in accessible schemas
+2. Filters candidates to those with COMPLETE refresh status
+3. For each candidate, parses the MV's defining SQL and converts it to a 
relational expression
+4. Uses Calcite's SubstitutionVisitor to check if the MV's query structure 
matches part or all of the user's query
+5. If a match is found, substitutes the matching portion with a scan of the 
materialized view data
+6. Selects the rewritten plan if it offers better performance characteristics
+
+### Rewriting Scenarios
+
+Query rewriting can apply in several scenarios:
+
+**Exact Match**: The user's query exactly matches the MV definition.
+
+```sql
+-- MV definition
+CREATE MATERIALIZED VIEW dfs.tmp.region_totals AS
+SELECT r_regionkey, COUNT(*) as cnt FROM cp.`region.json` GROUP BY r_regionkey;
+
+-- This query will use the MV
+SELECT r_regionkey, COUNT(*) as cnt FROM cp.`region.json` GROUP BY r_regionkey;
+```
+
+**Partial Match with Additional Filters**: The user's query adds filters on 
top of the MV.
+
+```sql
+-- This query may use the MV and apply the filter
+SELECT r_regionkey, cnt FROM dfs.tmp.region_totals WHERE cnt > 10;
+```
+
+**Aggregate Rollup**: Higher-level aggregations computed from MV aggregates.
+
+### Viewing the Execution Plan
+
+Use EXPLAIN to see if a materialized view is being used:
+
+```sql
+EXPLAIN PLAN FOR
+SELECT r_regionkey, COUNT(*) FROM cp.`region.json` GROUP BY r_regionkey;
+```
+
+If the MV is used, the plan will show a scan of the materialized view data 
location rather than the original table.
+
+## Storage Architecture
+
+### Definition Storage
+
+Materialized view definitions are stored as JSON files with the 
`.materialized_view.drill` extension in the workspace directory. This follows 
the same pattern as regular Drill views (`.view.drill` files).
+
+The definition file contains:
+- View name
+- Defining SQL statement
+- Field names and types
+- Workspace schema path
+- Data storage path
+- Last refresh timestamp
+- Refresh status (PENDING or COMPLETE)
+
+Example definition file structure:
+
+```json
+{
+  "name": "sales_summary",
+  "sql": "SELECT region, SUM(amount) as total FROM sales GROUP BY region",
+  "fields": [
+    {"name": "region", "type": "VARCHAR"},
+    {"name": "total", "type": "DOUBLE"}
+  ],
+  "workspaceSchemaPath": ["dfs", "tmp"],
+  "dataStoragePath": "sales_summary",
+  "lastRefreshTime": 1706900000000,
+  "refreshStatus": "COMPLETE"
+}
+```
+
+### Data Storage
+
+Materialized view data is stored as Parquet files in a directory named 
`{view_name}_mv_data` within the workspace. Parquet format provides:
+- Efficient columnar storage
+- Compression
+- Predicate pushdown support
+- Schema evolution capabilities
+
+For a materialized view named `sales_summary` in `dfs.tmp`, the storage 
structure would be:
+
+```
+/tmp/
+  sales_summary.materialized_view.drill    # Definition file
+  sales_summary_mv_data/                   # Data directory
+    0_0_0.parquet                          # Data files
+    0_0_1.parquet
+    ...

Review Comment:
   We could support customized formats in the future. Using a lake format (e.g. 
Iceberg) would naturally support mv partitioning and table statistics. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] DRILL-8543: Add Support for Materialized Views (drill)

Reply via email to