GitHub user flyrain edited a discussion: Apache Polaris Roadmap Proposal

Over the past months, we've collaborated with a wide range of 
stakeholders—companies, developers, and users—who are invested in the evolution 
of Apache Polaris. This roadmap consolidates those insights into a shared 
vision, ensuring that our efforts address the most impactful and widely 
supported improvements. We appreciate the valuable feedback and collaboration 
that have shaped this direction.

> **Note that the features can move in and out of the milestones based on 
> prioritization, and available resources.**

The roadmap Items can be broadly classified into several categories such as

1. Core Polaris functions  
2. Catalog Federation and Integrations  
3. Data Security, Data Governance and Compliance   
4. Observability and Reliability   
5. AI/ML

## **Feature Proposal List**

| Category | Feature  | 0.9 | 1.0 | 1.1 | 1.2 | 1.3 | 1.4 | 1.5+ |
| :---- | :---- | :---- | :---- | :---- | :---- | :---- | :---- | :---- |
| Core Polaris  | Iceberg REST  Spec support (including view and multi-table 
transactions) | GA | GA | GA | GA | GA | GA | GA |
|  | Support for Delta Format as Generic Tables |  | Preview | X | X | X | X | 
X |
|  | [Policy Store](https://github.com/orgs/apache/projects/469) |  | GA | GA | 
GA | GA | GA | GA |
|  | [Table Maintenance Framework](https://github.com/orgs/apache/projects/435) 
|  |  | X | X | X | X | X |
|  | JDBC Persistence layer |  | GA | GA | GA | GA | GA | GA |
|  | No SQL Persistence layer |  |  | X | X | X | X | X |
|  | Catalog Browser experience(UI)  |  |  |  |  |  |  | X |
|  |  |  |  |  |  |  |  |  |
| Catalog Federation & Integrations | Catalog Federation |  | Preview | X | X | 
X | X | X |
|  | Catalog Migrator |  | X | X | X | X | X | X |
|  | Identity Federation and SSO |  | GA | GA | GA | GA | GA | GA |
| Data Security, Governance and Compliance | Role-based Access Control(RBAC) | 
GA | GA | GA | GA | GA | GA | GA |
|  | Credential Vending (including support for S3, ADLS and GCP) | GA | GA | GA 
| GA | GA | GA | GA |
|  | S3 Request Signing |  | X | X | X | X | X | X |
|  | Table Maintenance Policies |  | GA | GA | GA | GA | GA | GA |
|  | Row level and column level policies |  |  |  | X | X | X | X |
|  | Encryption Support |  |  |  |  | X | X | X |
|  | Audit and events interface |  |  | X | X | X | X | X |
|  | Data Lineage |  |  |  |  | X | X | X |
|  | Data tagging and classifications |  |  |  |  |  |  | X |
|  | Attribute-Based Access Control(ABAC) |  |  |  |  |  |  | X |
|  |  |  |  |  |  |  |  |  |
| Observability and Telemetry | Data lake operational metrics |  |  |  |  |  |  
| X |
|  | Data health monitoring and alerts |  |  |  |  |  |  | X |
| AI/ML | Volumes/Directory Tables |  |  |  |  |  |  | X |

Legend:
**X**: Tentatively Planned
**GA**: Generally Available
**Preview**: Experimental Feature

## **Core Polaris Functions**

### Generic Tables and Delta Format Support

Polaris enables support for non-Iceberg table formats through the concept of 
Generic Tables. These tables behave similarly to regular tables but include an 
additional format attribute that defines the table format type. This approach 
not only opens up flexibility to support additional formats but also sets the 
stage for enhancing Polaris’ capabilities.

For example, while Polaris currently supports the Iceberg REST catalog for 
managing and querying large datasets, incorporating support for additional 
formats—such as Delta—would further extend its capabilities. Delta format 
support would allow for enhanced governance, compliance, data management, 
disaster recovery, and migrations within the same Catalog. This enhancement 
involves generating Iceberg metadata to read Delta tables and enabling both 
Delta read and write operations from engines like Apache Spark.

More details here 
1. [Generic Table 
Support](https://docs.google.com/document/d/1_R9jBIwoH3CV9G7gSoRJPQVEcOsROp4IEiGoeYeQE8A/edit?tab=t.0#heading=h.1qpl30dg4lui)
2. [Generate Iceberg metadata to read Delta 
tables](https://docs.google.com/document/d/1H2StuZ26LroibuQni3IJlErlKgrV9fEvYLHHqN7HWfE/edit?tab=t.0#heading=h.b956txtpu769)
 

**Milestone: 1.1**

### Policy Store

Apache Polaris' support for a Policy Store allows it to serve as a centralized 
repository for all policies related to data assets, ensuring consistent 
governance and compliance across the organization. This includes policies for 
table maintenance, access control, data security, and overall data governance, 
enabling administrators to easily enforce, track, and audit these policies. By 
consolidating policy management in Polaris, organizations can streamline their 
data management processes while maintaining compliance and security standards. 

More details here [Policy Management in Apache 
Polaris](https://docs.google.com/document/d/1kIiVkFFg9tPa5SH70b9WwzbmclrzH3qWHKfCKXw5lbs/edit?tab=t.0#heading=h.nly223xz13km)
  
**Milestone 1.0**

### Table Maintenance Framework

Table Maintenance Framework brings capabilities to store Table maintenance 
policies, properties, statistics, and events necessary for performing Table 
maintenance and Optimizations. This does not include actual Table maintenance 
operations that need to run a compute infrastructure. More details here [Table 
Maintenance in 
Polaris](https://docs.google.com/document/d/1Pd_mzZcfvnUvcH98IbwsIYf4eryet1lQDfclKYx-t-M/edit?tab=t.0#heading=h.7ic5c343eju1)
  
**Milestone: 1.1**

### SQL and NoSQL Persistence 

Enable SQL (ex. Postgres) and NoSQL (ex. DynamoDB, Cassandra, etc) persistence 
storage backends for Polaris. More details here [Apache Polaris (incubating) - 
SQL/NoSQL persistence backend 
support](https://docs.google.com/document/d/1LlNhEy4cBjjE_um694fcsnizqd3rDm5pbewXkLxvu1o/edit?tab=t.0#heading=h.vmkska23of8t)
  
**Milestone: 1.0**

### S3-compatible storage support 

Support the s3-compatible storage, such as MinIO, Ceph, Dell ECS. More details 
are here, https://github.com/apache/polaris/pull/389.  
**Milestone: 1.0**

### Catalog Browser experience (UI)

User Experience and Interface for Apache Polaris. Enable users to browse 
catalogs, databases and tables. Provides basic operations on governance, policy 
management, and other governance functions. #572 is the corresponding issue. 
*Milestone: 1.5+*

## **Catalog Federation and Integrations**

### Catalog Federation 

Enable federation of reads and writes to any remote catalog thus making Apache 
Polaris a **Catalog of Catalogs**. This primarily includes catalogs that 
support IRC and Hive protocols. Some details here [Polaris Roadmap and Catalog 
Federation 
Diagrams](https://docs.google.com/document/d/1Q6eEytxb0btpOPcL8RtkULskOlYUCo_3FLvFRnHkzBY/edit?tab=t.0)
  
**Milestone: 1.0**

### Catalog Migrator

Users may want to move Iceberg Tables from several Catalog solutions into 
Apache Polaris. Catalog migrator enables migration of tables registered in 
catalogs such as Glue, Hive, or other Iceberg Rest Catalogs into Apache Polaris.
**Milestone: 1.0**

## **Data Security, Data Governance and Compliance**

### Governance Policies for Tables

Polaris will provide the ability to define access policies and other governance 
policies (such Retention) by Tables. More details here [Policy Management in 
Apache 
Polaris](https://docs.google.com/document/d/1kIiVkFFg9tPa5SH70b9WwzbmclrzH3qWHKfCKXw5lbs/edit?tab=t.0#heading=h.nly223xz13km)
  
**Milestone: 1.2**

### Column level and Row level Policies

Provides the capability to define and enforce column level and row level access 
and other governance policies. More details here  [Policy Management in Apache 
Polaris](https://docs.google.com/document/d/1kIiVkFFg9tPa5SH70b9WwzbmclrzH3qWHKfCKXw5lbs/edit?tab=t.0#heading=h.nly223xz13km)
  
**Milestone: 1.2**

### Identity federation, SCIM, SSO and OAuth support

Supporting SCIM and SAML is essential for efficient user provisioning, seamless 
access management, and enhanced security, ensuring that users can securely 
access and manage data resources while complying with organizational policies. 
This also enable easy identity federation and OAuth federation to third party 
identity providers. More details here [Adding Federated User and Role Support 
in 
Polaris](https://docs.google.com/document/d/15_3ZiRB6Lhzw0nxij341QUdxEIyFGTrI9_18bFIyJVo/edit?tab=t.0#heading=h.cu1a1acu4lc5)
  
**Milestone: 1.0**

### Audit and Events Interface

Enable audit logs and history for Catalog, Database, Table, Property and Policy 
changes through events interface. Initial spec details here [Polaris Event 
Listeners](https://docs.google.com/document/d/1sJiFKeMlPVlqRUj8Rv4YufMMrfCq_3ZFtDuNv_8eOQ0/edit?tab=t.0#heading=h.8d519gwzsle2)
  
**Milestone 1.2**

### Data Lineage

Data Lineage functionality allows users to trace the flow of data across 
different systems and tables providing visibility into its origin, origins and 
usage. This feature enhances data governance auditability and troubleshooting 
by visually representing data’s lifecycle from source to destination. This 
includes Table and Column lineages.  
**Milestone 1.5+**

### Data Tagging and Classification

Enable categorizing and labeling data assets based on predefined criteria, such 
as data type, sensitivity, or usage. This helps organizations efficiently 
organize, search, and secure their data by assigning meaningful tags and 
classifications, enabling better governance and compliance management. Through 
this feature, users can quickly locate relevant data and ensure appropriate 
access controls are in place  
**Milestone 1.5+**

### Encryption Support 

Enable support for encrypted Iceberg tables by managing Key Management Service 
(KMS) integrations. Facilitate the vending of encryption keys, ensuring 
seamless key retrieval and rotation for standard KMS solutions  
**Milestone 1.3**

## **Observability, Telemetry and Reliability**

### Data Lake Operational Metrics

Enable operational metrics on Catalog, Databases and Tables to enable 
operational manageability of the Data Lake. This includes 

* **Data-level metrics**: Number of files, Number of Partitions, Partition 
sizes, Total table size and more.  
* **Access Metrics**: Number of R/W access on table, Query load per file, R/W 
Latency  
* **Data Health Metrics**: Data Skew, Data Freshness, and more  
* **Storage-level Metrics**: Storage Utilization, Number of Small Files, 
Storage Growth, Hot Partitions, and more

**Milestone 1.5+**

### Data Health Monitoring and Alerts

Enable capabilities to monitor and alert on health of the data including

* Table Size and Growth Monitoring 
* Un-compacted partitions and files
* Data Skew monitoring and alerts

**Milestone 1.5+**

## AI/ML

### Volumes/Directory Tables

A table-like entity like volumes can be used for organizing and managing 
unstructured data. Volumes provide a way to group related data files logically, 
similar to directories or containers. More details here [Unstructured Data 
Support in 
Polaris](https://docs.google.com/document/d/1ofljkrtiXRWc-v6hfkg_laKlYltepTPX7zsg44Tb-BY/edit?tab=t.0#heading=h.7ic5c343eju1)
**Milestone 1.5+**

GitHub link: https://github.com/apache/polaris/discussions/1028

----
This is an automatically sent email for issues@polaris.apache.org.
To unsubscribe, please send an email to: issues-unsubscr...@polaris.apache.org

Reply via email to