GitHub user flyrain edited a discussion: Apache Polaris Roadmap Proposal
Over the past months, we've collaborated with a wide range of stakeholders—companies, developers, and users—who are invested in the evolution of Apache Polaris. This roadmap consolidates those insights into a shared vision, ensuring that our efforts address the most impactful and widely supported improvements. We appreciate the valuable feedback and collaboration that have shaped this direction. > **Note that the features can move in and out of the milestones based on > prioritization, and available resources.** The roadmap Items can be broadly classified into several categories such as 1. Core Polaris functions 2. Catalog Federation and Integrations 3. Data Security, Data Governance and Compliance 4. Observability and Reliability 5. AI/ML ## **Feature Proposal List** | Category | Feature | 0.9 | 1.0 | 1.1 | 1.2 | 1.3 | 1.4 | 1.5+ | | :---- | :---- | :---- | :---- | :---- | :---- | :---- | :---- | :---- | | Core Polaris | Iceberg REST Spec support (including view and multi-table transactions) | GA | GA | GA | GA | GA | GA | GA | | | Support for Delta Format as Generic Tables | | Preview | X | X | X | X | X | | | [Policy Store](https://github.com/orgs/apache/projects/469) | | GA | GA | GA | GA | GA | GA | | | [Table Maintenance Framework](https://github.com/orgs/apache/projects/435) | | | X | X | X | X | X | | | JDBC Persistence layer | | GA | GA | GA | GA | GA | GA | | | No SQL Persistence layer | | | X | X | X | X | X | | | Catalog Browser experience(UI) | | | | | | | X | | | | | | | | | | | | Catalog Federation & Integrations | Catalog Federation | | Preview | X | X | X | X | X | | | Catalog Migrator | | X | X | X | X | X | X | | | Identity Federation and SSO | | GA | GA | GA | GA | GA | GA | | Data Security, Governance and Compliance | Role-based Access Control(RBAC) | GA | GA | GA | GA | GA | GA | GA | | | Credential Vending (including support for S3, ADLS and GCP) | GA | GA | GA | GA | GA | GA | GA | | | S3 Request Signing | | | X | X | X | X | X | | | Table Maintenance Policies | | GA | GA | GA | GA | GA | GA | | | Row level and column level policies | | | | X | X | X | X | | | Encryption Support | | | | | X | X | X | | | Audit and events interface | | | X | X | X | X | X | | | Data Lineage | | | | | X | X | X | | | Data tagging and classifications | | | | | | | X | | | Attribute-Based Access Control(ABAC) | | | | | | | X | | | | | | | | | | | | Observability and Telemetry | Data lake operational metrics | | | | | | | X | | | Data health monitoring and alerts | | | | | | | X | | AI/ML | Volumes/Directory Tables | | | | | | | X | Legend: **X**: Tentatively Planned **GA**: Generally Available **Preview**: Experimental Feature ## **Core Polaris Functions** ### Generic Tables and Delta Format Support Polaris enables support for non-Iceberg table formats through the concept of Generic Tables. These tables behave similarly to regular tables but include an additional format attribute that defines the table format type. This approach not only opens up flexibility to support additional formats but also sets the stage for enhancing Polaris’ capabilities. For example, while Polaris currently supports the Iceberg REST catalog for managing and querying large datasets, incorporating support for additional formats—such as Delta—would further extend its capabilities. Delta format support would allow for enhanced governance, compliance, data management, disaster recovery, and migrations within the same Catalog. This enhancement involves generating Iceberg metadata to read Delta tables and enabling both Delta read and write operations from engines like Apache Spark. More details here 1. [Generic Table Support](https://docs.google.com/document/d/1_R9jBIwoH3CV9G7gSoRJPQVEcOsROp4IEiGoeYeQE8A/edit?tab=t.0#heading=h.1qpl30dg4lui) 2. [Generate Iceberg metadata to read Delta tables](https://docs.google.com/document/d/1H2StuZ26LroibuQni3IJlErlKgrV9fEvYLHHqN7HWfE/edit?tab=t.0#heading=h.b956txtpu769) **Milestone: 1.1** ### Policy Store Apache Polaris' support for a Policy Store allows it to serve as a centralized repository for all policies related to data assets, ensuring consistent governance and compliance across the organization. This includes policies for table maintenance, access control, data security, and overall data governance, enabling administrators to easily enforce, track, and audit these policies. By consolidating policy management in Polaris, organizations can streamline their data management processes while maintaining compliance and security standards. More details here [Policy Management in Apache Polaris](https://docs.google.com/document/d/1kIiVkFFg9tPa5SH70b9WwzbmclrzH3qWHKfCKXw5lbs/edit?tab=t.0#heading=h.nly223xz13km) **Milestone 1.0** ### Table Maintenance Framework Table Maintenance Framework brings capabilities to store Table maintenance policies, properties, statistics, and events necessary for performing Table maintenance and Optimizations. This does not include actual Table maintenance operations that need to run a compute infrastructure. More details here [Table Maintenance in Polaris](https://docs.google.com/document/d/1Pd_mzZcfvnUvcH98IbwsIYf4eryet1lQDfclKYx-t-M/edit?tab=t.0#heading=h.7ic5c343eju1) **Milestone: 1.1** ### SQL and NoSQL Persistence Enable SQL (ex. Postgres) and NoSQL (ex. DynamoDB, Cassandra, etc) persistence storage backends for Polaris. More details here [Apache Polaris (incubating) - SQL/NoSQL persistence backend support](https://docs.google.com/document/d/1LlNhEy4cBjjE_um694fcsnizqd3rDm5pbewXkLxvu1o/edit?tab=t.0#heading=h.vmkska23of8t) **Milestone: 1.0** ### S3-compatible storage support Support the s3-compatible storage, such as MinIO, Ceph, Dell ECS. More details are here, https://github.com/apache/polaris/pull/389. **Milestone: 1.0** ### Catalog Browser experience (UI) User Experience and Interface for Apache Polaris. Enable users to browse catalogs, databases and tables. Provides basic operations on governance, policy management, and other governance functions. #572 is the corresponding issue. *Milestone: 1.5+* ## **Catalog Federation and Integrations** ### Catalog Federation Enable federation of reads and writes to any remote catalog thus making Apache Polaris a **Catalog of Catalogs**. This primarily includes catalogs that support IRC and Hive protocols. Some details here [Polaris Roadmap and Catalog Federation Diagrams](https://docs.google.com/document/d/1Q6eEytxb0btpOPcL8RtkULskOlYUCo_3FLvFRnHkzBY/edit?tab=t.0) **Milestone: 1.0** ### Catalog Migrator Users may want to move Iceberg Tables from several Catalog solutions into Apache Polaris. Catalog migrator enables migration of tables registered in catalogs such as Glue, Hive, or other Iceberg Rest Catalogs into Apache Polaris. **Milestone: 1.0** ## **Data Security, Data Governance and Compliance** ### Governance Policies for Tables Polaris will provide the ability to define access policies and other governance policies (such Retention) by Tables. More details here [Policy Management in Apache Polaris](https://docs.google.com/document/d/1kIiVkFFg9tPa5SH70b9WwzbmclrzH3qWHKfCKXw5lbs/edit?tab=t.0#heading=h.nly223xz13km) **Milestone: 1.2** ### Column level and Row level Policies Provides the capability to define and enforce column level and row level access and other governance policies. More details here [Policy Management in Apache Polaris](https://docs.google.com/document/d/1kIiVkFFg9tPa5SH70b9WwzbmclrzH3qWHKfCKXw5lbs/edit?tab=t.0#heading=h.nly223xz13km) **Milestone: 1.2** ### Identity federation, SCIM, SSO and OAuth support Supporting SCIM and SAML is essential for efficient user provisioning, seamless access management, and enhanced security, ensuring that users can securely access and manage data resources while complying with organizational policies. This also enable easy identity federation and OAuth federation to third party identity providers. More details here [Adding Federated User and Role Support in Polaris](https://docs.google.com/document/d/15_3ZiRB6Lhzw0nxij341QUdxEIyFGTrI9_18bFIyJVo/edit?tab=t.0#heading=h.cu1a1acu4lc5) **Milestone: 1.0** ### Audit and Events Interface Enable audit logs and history for Catalog, Database, Table, Property and Policy changes through events interface. Initial spec details here [Polaris Event Listeners](https://docs.google.com/document/d/1sJiFKeMlPVlqRUj8Rv4YufMMrfCq_3ZFtDuNv_8eOQ0/edit?tab=t.0#heading=h.8d519gwzsle2) **Milestone 1.2** ### Data Lineage Data Lineage functionality allows users to trace the flow of data across different systems and tables providing visibility into its origin, origins and usage. This feature enhances data governance auditability and troubleshooting by visually representing data’s lifecycle from source to destination. This includes Table and Column lineages. **Milestone 1.5+** ### Data Tagging and Classification Enable categorizing and labeling data assets based on predefined criteria, such as data type, sensitivity, or usage. This helps organizations efficiently organize, search, and secure their data by assigning meaningful tags and classifications, enabling better governance and compliance management. Through this feature, users can quickly locate relevant data and ensure appropriate access controls are in place **Milestone 1.5+** ### Encryption Support Enable support for encrypted Iceberg tables by managing Key Management Service (KMS) integrations. Facilitate the vending of encryption keys, ensuring seamless key retrieval and rotation for standard KMS solutions **Milestone 1.3** ## **Observability, Telemetry and Reliability** ### Data Lake Operational Metrics Enable operational metrics on Catalog, Databases and Tables to enable operational manageability of the Data Lake. This includes * **Data-level metrics**: Number of files, Number of Partitions, Partition sizes, Total table size and more. * **Access Metrics**: Number of R/W access on table, Query load per file, R/W Latency * **Data Health Metrics**: Data Skew, Data Freshness, and more * **Storage-level Metrics**: Storage Utilization, Number of Small Files, Storage Growth, Hot Partitions, and more **Milestone 1.5+** ### Data Health Monitoring and Alerts Enable capabilities to monitor and alert on health of the data including * Table Size and Growth Monitoring * Un-compacted partitions and files * Data Skew monitoring and alerts **Milestone 1.5+** ## AI/ML ### Volumes/Directory Tables A table-like entity like volumes can be used for organizing and managing unstructured data. Volumes provide a way to group related data files logically, similar to directories or containers. More details here [Unstructured Data Support in Polaris](https://docs.google.com/document/d/1ofljkrtiXRWc-v6hfkg_laKlYltepTPX7zsg44Tb-BY/edit?tab=t.0#heading=h.7ic5c343eju1) **Milestone 1.5+** GitHub link: https://github.com/apache/polaris/discussions/1028 ---- This is an automatically sent email for issues@polaris.apache.org. To unsubscribe, please send an email to: issues-unsubscr...@polaris.apache.org