nssalian commented on code in PR #3645:
URL: https://github.com/apache/polaris/pull/3645#discussion_r2756854004


##########
site/content/blog/2026/02/04/floe-polaris-integration.md:
##########
@@ -0,0 +1,231 @@
+---
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+title: "Floe and Apache Polaris: Policy-Driven Table Maintenance for Apache 
Iceberg"
+date: 2026-02-04
+author: Neelesh Salian
+---
+
+## Introduction
+
+Iceberg tables accumulate technical debt over time. Small files multiply as 
streaming jobs append data in micro-batches. Delete files pile up from CDC 
workloads. Snapshots grow unbounded, bloating metadata. Without regular 
maintenance, query performance degrades, storage costs rise, and planning times 
stretch from milliseconds to seconds.
+
+Apache Polaris provides a vendor-neutral Iceberg catalog with governance and 
access control, but it does not execute maintenance operations. The catalog 
manages metadata and enforces permissions. Compaction, snapshot expiration, 
orphan cleanup, and manifest optimization remain the user's responsibility.
+
+[Floe](https://github.com/nssalian/floe) fills that gap. It connects to 
Polaris, discovers tables, evaluates their health, and orchestrates maintenance 
through policy-driven automation. Instead of writing custom scripts or manually 
running Spark jobs, you define policies that specify what maintenance to 
perform, which tables to target, and under what conditions to trigger 
execution. Floe handles the rest: scheduling, execution via Spark or Trino, and 
tracking outcomes.
+
+## Architecture
+
+Polaris remains the source of truth for metadata and access control. Floe 
reads the catalog, evaluates policies, triggers maintenance on your chosen 
engine, and records outcomes.
+
+![Polaris + Floe 
Architecture](/img/blog/2026/02/04/high_level_architecture.png)
+
+### Data Flow
+
+1. **Policy discovery**: Floe loads enabled policies and matches them to 
tables.
+2. **Health assessment**: Floe evaluates table health based on scan mode and 
thresholds.
+3. **Planning & gating**: The planner selects operations; trigger conditions 
decide if they run.
+4. **Execution**: The orchestrator dispatches operations to Spark or Trino.
+5. **Persistence**: Results and health history are stored for tracking and 
recommendations.
+
+## Quick Start
+
+```bash
+make example-polaris

Review Comment:
   Good catch. I'll update the Quick Start to clarify.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to