dimas-b commented on code in PR #4613: URL: https://github.com/apache/polaris/pull/4613#discussion_r3364896532
########## site/content/in-dev/unreleased/directories.md: ########## @@ -0,0 +1,227 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: Directories +type: docs +weight: 450 +--- + +## Overview + +Directories make objects (including unstructured data like images, videos, documents, and other objects) discoverable alongside +structured Iceberg tables within a Polaris catalog. +A directory points to a base location/prefix on an object store and automatically tracks the objects it contains by maintaining +an Iceberg table with object-level metadata such as URI, size, content type, checksum, ... + +This means query engines and tools that already know how to read Iceberg tables can discover and +access unstructured data with little or no extra work (accessing the object itself). + +## Concepts + +A directory has two main parts: + +1. **Directory configuration** — stored by the Polaris server. It describes _where_ the data lives, + how to authenticate, which objects to include, and how often to re-scan. The configuration "lives" in a namespace. +2. **Directory table** — an Iceberg table serving as the inventory of all objects contained in the directory, one row per object discovered during a scan. + The directory table uses the configuration name. + +The Polaris server itself does not perform scans. Instead, external services (e.g. directory table scanning service) read the directory configuration through the REST API, +walk the object store, and write the results into the directory table. + +## Configuration + +A directory is described by the following fields: + +| Field | Type | Required | Description | +|-------|------|----------|-------------| +| `name` | `string` | Yes | The name of the directory. It is used in REST endpoint paths and becomes the name of the corresponding Iceberg table. | Review Comment: Does this mean that the "directory" and the "table" share the same name? Do they live in the same namespace in Polaris (basically being distinguishable only by object type)? (this is not an objection, just trying to understand the situation better 😅 ) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
