rdblue commented on a change in pull request #3188:
URL: https://github.com/apache/iceberg/pull/3188#discussion_r805437089
##########
File path: site/docs/spec.md
##########
@@ -357,6 +357,7 @@ A manifest file must store the partition spec and other
metadata as properties i
| _required_ | _required_ | `partition-spec` | JSON fields representation
of the partition spec used to write the manifest |
| _optional_ | _required_ | `partition-spec-id` | ID of the partition spec
used to write the manifest as a string |
| _optional_ | _required_ | `format-version` | Table format version number
of the manifest as a string |
+| _optional_ | _optional_ | `object-type` | Type of object this metadata
file is for: "table" or "view". Default is "table". |
Review comment:
I would state that "if missing, assumed to be "table"` rather than
"default is table". Using "default" doesn't quite seem correct to me.
##########
File path: site/docs/spec.md
##########
@@ -967,6 +968,7 @@ Table metadata is serialized as a JSON object according to
the following table.
|Metadata field|JSON representation|Example|
|--- |--- |--- |
|**`format-version`**|`JSON int`|`1`|
+|**`object-type`**|`JSON string`|`table`|
Review comment:
Nit: There should be double quotes around `table` because the example
should be a JSON string, like the other strings below.
##########
File path: site/docs/view-spec.md
##########
@@ -0,0 +1,256 @@
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements. See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License. You may obtain a copy of the License at
+ -
+ - http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Iceberg View Spec
+
+## Background and Motivation
+
+Most compute engines (e.g. Trino and Apache Spark) support logical views,
commonly known as ‘views’. A view is a logical table that can be referenced by
future queries. Views do not contain any data. Instead, the query stored by the
view is executed every time the view is referenced by another query. Views and
tables occupy the same namespace.
Review comment:
Is it necessary to say "commonly known as views"?
##########
File path: site/docs/view-spec.md
##########
@@ -0,0 +1,256 @@
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements. See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License. You may obtain a copy of the License at
+ -
+ - http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Iceberg View Spec
+
+## Background and Motivation
+
+Most compute engines (e.g. Trino and Apache Spark) support logical views,
commonly known as ‘views’. A view is a logical table that can be referenced by
future queries. Views do not contain any data. Instead, the query stored by the
view is executed every time the view is referenced by another query. Views and
tables occupy the same namespace.
Review comment:
The last sentence, "Views and tables occupy the same namespace" doesn't
seem necessary. That is a catalog choice, right? It doesn't affect this spec if
someone uses it for catalogs that do not mix views and tables so that they
occupy separate namespaces.
##########
File path: site/docs/view-spec.md
##########
@@ -0,0 +1,256 @@
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements. See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License. You may obtain a copy of the License at
+ -
+ - http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Iceberg View Spec
+
+## Background and Motivation
+
+Most compute engines (e.g. Trino and Apache Spark) support logical views,
commonly known as ‘views’. A view is a logical table that can be referenced by
future queries. Views do not contain any data. Instead, the query stored by the
view is executed every time the view is referenced by another query. Views and
tables occupy the same namespace.
+Each compute engine stores the metadata of the view in its proprietary format
in the metastore of choice. Thus, views created from one engine can not be read
or altered easily from another engine even when engines share the metastore as
well as the storage system. This document standardizes the view metadata for
ease of sharing the views across engines.
Review comment:
Is this intended to be a separate paragraph? If so, I think you need a
newline between this line and the previous one.
##########
File path: site/docs/view-spec.md
##########
@@ -0,0 +1,256 @@
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements. See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License. You may obtain a copy of the License at
+ -
+ - http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Iceberg View Spec
+
+## Background and Motivation
+
+Most compute engines (e.g. Trino and Apache Spark) support logical views,
commonly known as ‘views’. A view is a logical table that can be referenced by
future queries. Views do not contain any data. Instead, the query stored by the
view is executed every time the view is referenced by another query. Views and
tables occupy the same namespace.
+Each compute engine stores the metadata of the view in its proprietary format
in the metastore of choice. Thus, views created from one engine can not be read
or altered easily from another engine even when engines share the metastore as
well as the storage system. This document standardizes the view metadata for
ease of sharing the views across engines.
+
+## Goals
+
+* A common metadata format for view metadata, similar to how Iceberg supports
a common table format for tables.
+* The view metadata format specification
+ * Includes storage format as well as APIs to write/read the metadata.
+ * Supports versioning of views to track how a view evolved over time.
+
+## Overview
+
+The view metadata storage and retrieval mirrors how Iceberg table metadata is
stored and retrieved. The view metadata is stored in a JSON file on object
storage for ease of tracking the evolution of the view. Metastore continues to
hold the view object with some properties such as database name, owner, create
time, last access time and an indication that the object is a view.
Review comment:
I think it is good to call out that the metadata model matches tables,
but this goes a bit too far and states requirements for the metastore, like
tracking "database name, owner, create time, last access time, ..."
I think this should be more similar to the table spec and bring in a lot of
similar wording:
> View metadata storage mirrors how Iceberg table metadata is stored and
retrieved. View metadata is maintained in metadata files. All changes to view
state create a new view metadata file and replace the old metadata using an
atomic swap. Like Iceberg tables, this atomic swap is delegated to the
metastore that tracks tables and/or views by name. The view metadata file
tracks the view schema, custom properties, current and past versions, as well
as other metadata.
##########
File path: site/docs/view-spec.md
##########
@@ -0,0 +1,256 @@
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements. See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License. You may obtain a copy of the License at
+ -
+ - http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Iceberg View Spec
+
+## Background and Motivation
+
+Most compute engines (e.g. Trino and Apache Spark) support logical views,
commonly known as ‘views’. A view is a logical table that can be referenced by
future queries. Views do not contain any data. Instead, the query stored by the
view is executed every time the view is referenced by another query. Views and
tables occupy the same namespace.
+Each compute engine stores the metadata of the view in its proprietary format
in the metastore of choice. Thus, views created from one engine can not be read
or altered easily from another engine even when engines share the metastore as
well as the storage system. This document standardizes the view metadata for
ease of sharing the views across engines.
+
+## Goals
+
+* A common metadata format for view metadata, similar to how Iceberg supports
a common table format for tables.
+* The view metadata format specification
+ * Includes storage format as well as APIs to write/read the metadata.
+ * Supports versioning of views to track how a view evolved over time.
+
+## Overview
+
+The view metadata storage and retrieval mirrors how Iceberg table metadata is
stored and retrieved. The view metadata is stored in a JSON file on object
storage for ease of tracking the evolution of the view. Metastore continues to
hold the view object with some properties such as database name, owner, create
time, last access time and an indication that the object is a view.
+
+Each ‘CREATE OR REPLACE VIEW’ statement creates a new view version metadata
file for that view.
Review comment:
Did you intend to use backticks and fixed-width font for `CREATE OR
REPLACE VIEW`?
Is it necessary to state this in terms of SQL? Above, I suggested "all
changes to view state create a new metadata file and completely replace the old
metadata ..." That seems sufficient to me.
##########
File path: site/docs/view-spec.md
##########
@@ -0,0 +1,256 @@
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements. See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License. You may obtain a copy of the License at
+ -
+ - http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Iceberg View Spec
+
+## Background and Motivation
+
+Most compute engines (e.g. Trino and Apache Spark) support logical views,
commonly known as ‘views’. A view is a logical table that can be referenced by
future queries. Views do not contain any data. Instead, the query stored by the
view is executed every time the view is referenced by another query. Views and
tables occupy the same namespace.
+Each compute engine stores the metadata of the view in its proprietary format
in the metastore of choice. Thus, views created from one engine can not be read
or altered easily from another engine even when engines share the metastore as
well as the storage system. This document standardizes the view metadata for
ease of sharing the views across engines.
+
+## Goals
+
+* A common metadata format for view metadata, similar to how Iceberg supports
a common table format for tables.
+* The view metadata format specification
+ * Includes storage format as well as APIs to write/read the metadata.
+ * Supports versioning of views to track how a view evolved over time.
+
+## Overview
+
+The view metadata storage and retrieval mirrors how Iceberg table metadata is
stored and retrieved. The view metadata is stored in a JSON file on object
storage for ease of tracking the evolution of the view. Metastore continues to
hold the view object with some properties such as database name, owner, create
time, last access time and an indication that the object is a view.
Review comment:
I think it is good to call out that the metadata model matches tables,
but this goes a bit too far and states requirements for the metastore, like
tracking "database name, owner, create time, last access time, ..."
I think this should be more similar to the table spec and bring in a lot of
similar wording:
> View metadata storage mirrors how Iceberg table metadata is stored and
retrieved. View metadata is maintained in metadata files. All changes to view
state create a new view metadata file and completely replace the old metadata
using an atomic swap. Like Iceberg tables, this atomic swap is delegated to the
metastore that tracks tables and/or views by name. The view metadata file
tracks the view schema, custom properties, current and past versions, as well
as other metadata.
##########
File path: site/docs/view-spec.md
##########
@@ -0,0 +1,256 @@
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements. See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License. You may obtain a copy of the License at
+ -
+ - http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Iceberg View Spec
+
+## Background and Motivation
+
+Most compute engines (e.g. Trino and Apache Spark) support logical views,
commonly known as ‘views’. A view is a logical table that can be referenced by
future queries. Views do not contain any data. Instead, the query stored by the
view is executed every time the view is referenced by another query. Views and
tables occupy the same namespace.
+Each compute engine stores the metadata of the view in its proprietary format
in the metastore of choice. Thus, views created from one engine can not be read
or altered easily from another engine even when engines share the metastore as
well as the storage system. This document standardizes the view metadata for
ease of sharing the views across engines.
+
+## Goals
+
+* A common metadata format for view metadata, similar to how Iceberg supports
a common table format for tables.
+* The view metadata format specification
+ * Includes storage format as well as APIs to write/read the metadata.
+ * Supports versioning of views to track how a view evolved over time.
+
+## Overview
+
+The view metadata storage and retrieval mirrors how Iceberg table metadata is
stored and retrieved. The view metadata is stored in a JSON file on object
storage for ease of tracking the evolution of the view. Metastore continues to
hold the view object with some properties such as database name, owner, create
time, last access time and an indication that the object is a view.
+
+Each ‘CREATE OR REPLACE VIEW’ statement creates a new view version metadata
file for that view.
+Each metadata file is self-sufficient. It contains the history of the last few
operations performed on the view and can be used to roll back the view to a
previous version.
Review comment:
I think this is a good thing to call out: "Each metadata file is
self-sufficient".
I'm not sure what you mean by "and can be used to roll back the view to a
previous version". It sounds like this is a suggestion that rolling back the
pointer to an old metadata file is a possibility, which is not a good idea.
Metadata files should _always_ roll forward. Otherwise, structures like a
current version log are not maintained.
##########
File path: site/docs/view-spec.md
##########
@@ -0,0 +1,256 @@
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements. See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License. You may obtain a copy of the License at
+ -
+ - http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Iceberg View Spec
+
+## Background and Motivation
+
+Most compute engines (e.g. Trino and Apache Spark) support logical views,
commonly known as ‘views’. A view is a logical table that can be referenced by
future queries. Views do not contain any data. Instead, the query stored by the
view is executed every time the view is referenced by another query. Views and
tables occupy the same namespace.
+Each compute engine stores the metadata of the view in its proprietary format
in the metastore of choice. Thus, views created from one engine can not be read
or altered easily from another engine even when engines share the metastore as
well as the storage system. This document standardizes the view metadata for
ease of sharing the views across engines.
+
+## Goals
+
+* A common metadata format for view metadata, similar to how Iceberg supports
a common table format for tables.
+* The view metadata format specification
+ * Includes storage format as well as APIs to write/read the metadata.
+ * Supports versioning of views to track how a view evolved over time.
+
+## Overview
+
+The view metadata storage and retrieval mirrors how Iceberg table metadata is
stored and retrieved. The view metadata is stored in a JSON file on object
storage for ease of tracking the evolution of the view. Metastore continues to
hold the view object with some properties such as database name, owner, create
time, last access time and an indication that the object is a view.
+
+Each ‘CREATE OR REPLACE VIEW’ statement creates a new view version metadata
file for that view.
+Each metadata file is self-sufficient. It contains the history of the last few
operations performed on the view and can be used to roll back the view to a
previous version.
+
+### Metadata Location
+
+The view metadata location is managed exactly like table metadata location.
Review comment:
Should this point to the [Optimistic concurrency
section](https://iceberg.apache.org/spec/#optimistic-concurrency) of the other
spec? It seems a bit too vague to say that it is managed like the table
metadata location.
##########
File path: site/docs/view-spec.md
##########
@@ -0,0 +1,256 @@
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements. See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License. You may obtain a copy of the License at
+ -
+ - http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Iceberg View Spec
+
+## Background and Motivation
+
+Most compute engines (e.g. Trino and Apache Spark) support logical views,
commonly known as ‘views’. A view is a logical table that can be referenced by
future queries. Views do not contain any data. Instead, the query stored by the
view is executed every time the view is referenced by another query. Views and
tables occupy the same namespace.
+Each compute engine stores the metadata of the view in its proprietary format
in the metastore of choice. Thus, views created from one engine can not be read
or altered easily from another engine even when engines share the metastore as
well as the storage system. This document standardizes the view metadata for
ease of sharing the views across engines.
+
+## Goals
+
+* A common metadata format for view metadata, similar to how Iceberg supports
a common table format for tables.
+* The view metadata format specification
+ * Includes storage format as well as APIs to write/read the metadata.
+ * Supports versioning of views to track how a view evolved over time.
+
+## Overview
+
+The view metadata storage and retrieval mirrors how Iceberg table metadata is
stored and retrieved. The view metadata is stored in a JSON file on object
storage for ease of tracking the evolution of the view. Metastore continues to
hold the view object with some properties such as database name, owner, create
time, last access time and an indication that the object is a view.
+
+Each ‘CREATE OR REPLACE VIEW’ statement creates a new view version metadata
file for that view.
+Each metadata file is self-sufficient. It contains the history of the last few
operations performed on the view and can be used to roll back the view to a
previous version.
+
+### Metadata Location
+
+The view metadata location is managed exactly like table metadata location.
+
+### Operations
Review comment:
What is the purpose of this section? Can we remove it? Seems like it
doesn't add much to state what the view metadata may be used for.
##########
File path: site/docs/view-spec.md
##########
@@ -0,0 +1,256 @@
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements. See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License. You may obtain a copy of the License at
+ -
+ - http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Iceberg View Spec
+
+## Background and Motivation
+
+Most compute engines (e.g. Trino and Apache Spark) support logical views,
commonly known as ‘views’. A view is a logical table that can be referenced by
future queries. Views do not contain any data. Instead, the query stored by the
view is executed every time the view is referenced by another query. Views and
tables occupy the same namespace.
+Each compute engine stores the metadata of the view in its proprietary format
in the metastore of choice. Thus, views created from one engine can not be read
or altered easily from another engine even when engines share the metastore as
well as the storage system. This document standardizes the view metadata for
ease of sharing the views across engines.
+
+## Goals
+
+* A common metadata format for view metadata, similar to how Iceberg supports
a common table format for tables.
+* The view metadata format specification
+ * Includes storage format as well as APIs to write/read the metadata.
+ * Supports versioning of views to track how a view evolved over time.
+
+## Overview
+
+The view metadata storage and retrieval mirrors how Iceberg table metadata is
stored and retrieved. The view metadata is stored in a JSON file on object
storage for ease of tracking the evolution of the view. Metastore continues to
hold the view object with some properties such as database name, owner, create
time, last access time and an indication that the object is a view.
+
+Each ‘CREATE OR REPLACE VIEW’ statement creates a new view version metadata
file for that view.
+Each metadata file is self-sufficient. It contains the history of the last few
operations performed on the view and can be used to roll back the view to a
previous version.
+
+### Metadata Location
+
+The view metadata location is managed exactly like table metadata location.
Review comment:
It seems valuable enough to me that I would recommend copying some
relevant text over:
> An atomic swap of one view metadata file for another provides the basis
for making changes. Readers use the version of the view that was current when
they loaded the view metadata and are not affected by changes until they
refresh and pick up a new metadata location.
>
> Writers create view metadata files optimistically, assuming that the
current metadata location will not be changed before the writer’s commit. Once
a writer has created an update, it commits by swapping the view's metadata file
pointer from the base location to the new location.
##########
File path: site/docs/view-spec.md
##########
@@ -0,0 +1,256 @@
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements. See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License. You may obtain a copy of the License at
+ -
+ - http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Iceberg View Spec
+
+## Background and Motivation
+
+Most compute engines (e.g. Trino and Apache Spark) support logical views,
commonly known as ‘views’. A view is a logical table that can be referenced by
future queries. Views do not contain any data. Instead, the query stored by the
view is executed every time the view is referenced by another query. Views and
tables occupy the same namespace.
+Each compute engine stores the metadata of the view in its proprietary format
in the metastore of choice. Thus, views created from one engine can not be read
or altered easily from another engine even when engines share the metastore as
well as the storage system. This document standardizes the view metadata for
ease of sharing the views across engines.
+
+## Goals
+
+* A common metadata format for view metadata, similar to how Iceberg supports
a common table format for tables.
+* The view metadata format specification
+ * Includes storage format as well as APIs to write/read the metadata.
+ * Supports versioning of views to track how a view evolved over time.
+
+## Overview
+
+The view metadata storage and retrieval mirrors how Iceberg table metadata is
stored and retrieved. The view metadata is stored in a JSON file on object
storage for ease of tracking the evolution of the view. Metastore continues to
hold the view object with some properties such as database name, owner, create
time, last access time and an indication that the object is a view.
+
+Each ‘CREATE OR REPLACE VIEW’ statement creates a new view version metadata
file for that view.
+Each metadata file is self-sufficient. It contains the history of the last few
operations performed on the view and can be used to roll back the view to a
previous version.
+
+### Metadata Location
+
+The view metadata location is managed exactly like table metadata location.
+
+### Operations
+
+* Create a view
+* Drop the view
+* Load a view to read the metadata
+* Replace the view
+* Change the view definition
+* Add/delete/edit column comments
+
+## Specification
+
+### Terms
+
+* **Schema** -- Names and types of fields in a view.
+* **Version** -- The state of a view at some point in time.
+
+### View Metadata
+
+The view version metadata file has the following fields:
+
+| Required/Optional | Field Name | Description |
+|-------------------|------------|-------------|
+| Required | format-version | Json format version number for the view metadata
spec. The view metadata spec and the corresponding format-version is
independent of table spec. Starts with 1 and is incremented when there is a
breaking change to view metadata. |
Review comment:
Nit: JSON should be consistently capitalized
##########
File path: site/docs/view-spec.md
##########
@@ -0,0 +1,256 @@
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements. See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License. You may obtain a copy of the License at
+ -
+ - http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Iceberg View Spec
+
+## Background and Motivation
+
+Most compute engines (e.g. Trino and Apache Spark) support logical views,
commonly known as ‘views’. A view is a logical table that can be referenced by
future queries. Views do not contain any data. Instead, the query stored by the
view is executed every time the view is referenced by another query. Views and
tables occupy the same namespace.
+Each compute engine stores the metadata of the view in its proprietary format
in the metastore of choice. Thus, views created from one engine can not be read
or altered easily from another engine even when engines share the metastore as
well as the storage system. This document standardizes the view metadata for
ease of sharing the views across engines.
+
+## Goals
+
+* A common metadata format for view metadata, similar to how Iceberg supports
a common table format for tables.
+* The view metadata format specification
+ * Includes storage format as well as APIs to write/read the metadata.
+ * Supports versioning of views to track how a view evolved over time.
+
+## Overview
+
+The view metadata storage and retrieval mirrors how Iceberg table metadata is
stored and retrieved. The view metadata is stored in a JSON file on object
storage for ease of tracking the evolution of the view. Metastore continues to
hold the view object with some properties such as database name, owner, create
time, last access time and an indication that the object is a view.
+
+Each ‘CREATE OR REPLACE VIEW’ statement creates a new view version metadata
file for that view.
+Each metadata file is self-sufficient. It contains the history of the last few
operations performed on the view and can be used to roll back the view to a
previous version.
+
+### Metadata Location
+
+The view metadata location is managed exactly like table metadata location.
+
+### Operations
+
+* Create a view
+* Drop the view
+* Load a view to read the metadata
+* Replace the view
+* Change the view definition
+* Add/delete/edit column comments
+
+## Specification
+
+### Terms
+
+* **Schema** -- Names and types of fields in a view.
+* **Version** -- The state of a view at some point in time.
+
+### View Metadata
+
+The view version metadata file has the following fields:
+
+| Required/Optional | Field Name | Description |
+|-------------------|------------|-------------|
+| Required | format-version | Json format version number for the view metadata
spec. The view metadata spec and the corresponding format-version is
independent of table spec. Starts with 1 and is incremented when there is a
breaking change to view metadata. |
Review comment:
I think it is good to note that the format-version is independent of the
table spec version, but probably in a note rather than in the table. We want to
keep the description focused on the requirements.
I think it is better to go with a description similar to the table one: "An
integer version number for the view format. Currently, this must be 1.
Implementations must throw an exception if the view's version is higher than
the supported version."
* There is no need to tie this to JSON. The version is for the spec.
* This changes infrequently, so it is okay to state that it must be 1 for
this version of the spec
* It is good to state that implementations must fail if they don't
understand the version
##########
File path: site/docs/view-spec.md
##########
@@ -0,0 +1,256 @@
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements. See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License. You may obtain a copy of the License at
+ -
+ - http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Iceberg View Spec
+
+## Background and Motivation
+
+Most compute engines (e.g. Trino and Apache Spark) support logical views,
commonly known as ‘views’. A view is a logical table that can be referenced by
future queries. Views do not contain any data. Instead, the query stored by the
view is executed every time the view is referenced by another query. Views and
tables occupy the same namespace.
+Each compute engine stores the metadata of the view in its proprietary format
in the metastore of choice. Thus, views created from one engine can not be read
or altered easily from another engine even when engines share the metastore as
well as the storage system. This document standardizes the view metadata for
ease of sharing the views across engines.
+
+## Goals
+
+* A common metadata format for view metadata, similar to how Iceberg supports
a common table format for tables.
+* The view metadata format specification
+ * Includes storage format as well as APIs to write/read the metadata.
+ * Supports versioning of views to track how a view evolved over time.
+
+## Overview
+
+The view metadata storage and retrieval mirrors how Iceberg table metadata is
stored and retrieved. The view metadata is stored in a JSON file on object
storage for ease of tracking the evolution of the view. Metastore continues to
hold the view object with some properties such as database name, owner, create
time, last access time and an indication that the object is a view.
+
+Each ‘CREATE OR REPLACE VIEW’ statement creates a new view version metadata
file for that view.
+Each metadata file is self-sufficient. It contains the history of the last few
operations performed on the view and can be used to roll back the view to a
previous version.
+
+### Metadata Location
+
+The view metadata location is managed exactly like table metadata location.
+
+### Operations
+
+* Create a view
+* Drop the view
+* Load a view to read the metadata
+* Replace the view
+* Change the view definition
+* Add/delete/edit column comments
+
+## Specification
+
+### Terms
+
+* **Schema** -- Names and types of fields in a view.
+* **Version** -- The state of a view at some point in time.
+
+### View Metadata
+
+The view version metadata file has the following fields:
+
+| Required/Optional | Field Name | Description |
+|-------------------|------------|-------------|
+| Required | format-version | Json format version number for the view metadata
spec. The view metadata spec and the corresponding format-version is
independent of table spec. Starts with 1 and is incremented when there is a
breaking change to view metadata. |
+| Required | object-type | Type of object this metadata file is for:
"table" or "view". It must be set to "view" for all objects covered in this
spec. |
+| Required | location | Location of the view metadata files |
Review comment:
I would be more general with this. You probably don't want to require
that files are kept directly in this folder, do you?
##########
File path: site/docs/view-spec.md
##########
@@ -0,0 +1,256 @@
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements. See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License. You may obtain a copy of the License at
+ -
+ - http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Iceberg View Spec
+
+## Background and Motivation
+
+Most compute engines (e.g. Trino and Apache Spark) support logical views,
commonly known as ‘views’. A view is a logical table that can be referenced by
future queries. Views do not contain any data. Instead, the query stored by the
view is executed every time the view is referenced by another query. Views and
tables occupy the same namespace.
+Each compute engine stores the metadata of the view in its proprietary format
in the metastore of choice. Thus, views created from one engine can not be read
or altered easily from another engine even when engines share the metastore as
well as the storage system. This document standardizes the view metadata for
ease of sharing the views across engines.
+
+## Goals
+
+* A common metadata format for view metadata, similar to how Iceberg supports
a common table format for tables.
+* The view metadata format specification
+ * Includes storage format as well as APIs to write/read the metadata.
+ * Supports versioning of views to track how a view evolved over time.
+
+## Overview
+
+The view metadata storage and retrieval mirrors how Iceberg table metadata is
stored and retrieved. The view metadata is stored in a JSON file on object
storage for ease of tracking the evolution of the view. Metastore continues to
hold the view object with some properties such as database name, owner, create
time, last access time and an indication that the object is a view.
+
+Each ‘CREATE OR REPLACE VIEW’ statement creates a new view version metadata
file for that view.
+Each metadata file is self-sufficient. It contains the history of the last few
operations performed on the view and can be used to roll back the view to a
previous version.
+
+### Metadata Location
+
+The view metadata location is managed exactly like table metadata location.
+
+### Operations
+
+* Create a view
+* Drop the view
+* Load a view to read the metadata
+* Replace the view
+* Change the view definition
+* Add/delete/edit column comments
+
+## Specification
+
+### Terms
+
+* **Schema** -- Names and types of fields in a view.
+* **Version** -- The state of a view at some point in time.
+
+### View Metadata
+
+The view version metadata file has the following fields:
+
+| Required/Optional | Field Name | Description |
+|-------------------|------------|-------------|
+| Required | format-version | Json format version number for the view metadata
spec. The view metadata spec and the corresponding format-version is
independent of table spec. Starts with 1 and is incremented when there is a
breaking change to view metadata. |
+| Required | object-type | Type of object this metadata file is for:
"table" or "view". It must be set to "view" for all objects covered in this
spec. |
+| Required | location | Location of the view metadata files |
+| Required | current-version-id | Current version of the view. Set to ‘1’ when
the view is first created. |
Review comment:
Is it required that the view has an initial version? It can't be an
empty view?
##########
File path: site/docs/view-spec.md
##########
@@ -0,0 +1,256 @@
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements. See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License. You may obtain a copy of the License at
+ -
+ - http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Iceberg View Spec
+
+## Background and Motivation
+
+Most compute engines (e.g. Trino and Apache Spark) support logical views,
commonly known as ‘views’. A view is a logical table that can be referenced by
future queries. Views do not contain any data. Instead, the query stored by the
view is executed every time the view is referenced by another query. Views and
tables occupy the same namespace.
+Each compute engine stores the metadata of the view in its proprietary format
in the metastore of choice. Thus, views created from one engine can not be read
or altered easily from another engine even when engines share the metastore as
well as the storage system. This document standardizes the view metadata for
ease of sharing the views across engines.
+
+## Goals
+
+* A common metadata format for view metadata, similar to how Iceberg supports
a common table format for tables.
+* The view metadata format specification
+ * Includes storage format as well as APIs to write/read the metadata.
+ * Supports versioning of views to track how a view evolved over time.
+
+## Overview
+
+The view metadata storage and retrieval mirrors how Iceberg table metadata is
stored and retrieved. The view metadata is stored in a JSON file on object
storage for ease of tracking the evolution of the view. Metastore continues to
hold the view object with some properties such as database name, owner, create
time, last access time and an indication that the object is a view.
+
+Each ‘CREATE OR REPLACE VIEW’ statement creates a new view version metadata
file for that view.
+Each metadata file is self-sufficient. It contains the history of the last few
operations performed on the view and can be used to roll back the view to a
previous version.
+
+### Metadata Location
+
+The view metadata location is managed exactly like table metadata location.
+
+### Operations
+
+* Create a view
+* Drop the view
+* Load a view to read the metadata
+* Replace the view
+* Change the view definition
+* Add/delete/edit column comments
+
+## Specification
+
+### Terms
+
+* **Schema** -- Names and types of fields in a view.
+* **Version** -- The state of a view at some point in time.
+
+### View Metadata
+
+The view version metadata file has the following fields:
+
+| Required/Optional | Field Name | Description |
+|-------------------|------------|-------------|
+| Required | format-version | Json format version number for the view metadata
spec. The view metadata spec and the corresponding format-version is
independent of table spec. Starts with 1 and is incremented when there is a
breaking change to view metadata. |
+| Required | object-type | Type of object this metadata file is for:
"table" or "view". It must be set to "view" for all objects covered in this
spec. |
+| Required | location | Location of the view metadata files |
+| Required | current-version-id | Current version of the view. Set to ‘1’ when
the view is first created. |
+| Optional | properties | A string to string map of view properties. Contains
pre-set properties such as ‘comment’ describing the view, does not contain
arbitrary metadata. |
Review comment:
I don't think this description is very clear. Using "pre-set" makes it
sound like the values are determined as well. Here as well, I'd use something
similar to the table version:
> A string to string map of table properties. This is used for metadata such
as "comment" and for settings that affect view maintenance. This is not
intended to be used for arbitrary metadata.
##########
File path: site/docs/view-spec.md
##########
@@ -0,0 +1,256 @@
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements. See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License. You may obtain a copy of the License at
+ -
+ - http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Iceberg View Spec
+
+## Background and Motivation
+
+Most compute engines (e.g. Trino and Apache Spark) support logical views,
commonly known as ‘views’. A view is a logical table that can be referenced by
future queries. Views do not contain any data. Instead, the query stored by the
view is executed every time the view is referenced by another query. Views and
tables occupy the same namespace.
+Each compute engine stores the metadata of the view in its proprietary format
in the metastore of choice. Thus, views created from one engine can not be read
or altered easily from another engine even when engines share the metastore as
well as the storage system. This document standardizes the view metadata for
ease of sharing the views across engines.
+
+## Goals
+
+* A common metadata format for view metadata, similar to how Iceberg supports
a common table format for tables.
+* The view metadata format specification
+ * Includes storage format as well as APIs to write/read the metadata.
+ * Supports versioning of views to track how a view evolved over time.
+
+## Overview
+
+The view metadata storage and retrieval mirrors how Iceberg table metadata is
stored and retrieved. The view metadata is stored in a JSON file on object
storage for ease of tracking the evolution of the view. Metastore continues to
hold the view object with some properties such as database name, owner, create
time, last access time and an indication that the object is a view.
+
+Each ‘CREATE OR REPLACE VIEW’ statement creates a new view version metadata
file for that view.
+Each metadata file is self-sufficient. It contains the history of the last few
operations performed on the view and can be used to roll back the view to a
previous version.
+
+### Metadata Location
+
+The view metadata location is managed exactly like table metadata location.
+
+### Operations
+
+* Create a view
+* Drop the view
+* Load a view to read the metadata
+* Replace the view
+* Change the view definition
+* Add/delete/edit column comments
+
+## Specification
+
+### Terms
+
+* **Schema** -- Names and types of fields in a view.
+* **Version** -- The state of a view at some point in time.
+
+### View Metadata
+
+The view version metadata file has the following fields:
+
+| Required/Optional | Field Name | Description |
+|-------------------|------------|-------------|
+| Required | format-version | Json format version number for the view metadata
spec. The view metadata spec and the corresponding format-version is
independent of table spec. Starts with 1 and is incremented when there is a
breaking change to view metadata. |
+| Required | object-type | Type of object this metadata file is for:
"table" or "view". It must be set to "view" for all objects covered in this
spec. |
+| Required | location | Location of the view metadata files |
+| Required | current-version-id | Current version of the view. Set to ‘1’ when
the view is first created. |
+| Optional | properties | A string to string map of view properties. Contains
pre-set properties such as ‘comment’ describing the view, does not contain
arbitrary metadata. |
+| Required | versions | An array of structs describing the last few versions
of the view. Controlled by the table property: “version.history.num_entries”.
See more below. |
Review comment:
"the last few versions" is not very specific. How about "known versions"?
Also, the convention that Iceberg uses for properties is to separate
hierarchy with `.` and words of a concept with `-`, not `_`. Can you change
this to `verison.history.num-entries`?
##########
File path: site/docs/view-spec.md
##########
@@ -0,0 +1,256 @@
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements. See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License. You may obtain a copy of the License at
+ -
+ - http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Iceberg View Spec
+
+## Background and Motivation
+
+Most compute engines (e.g. Trino and Apache Spark) support logical views,
commonly known as ‘views’. A view is a logical table that can be referenced by
future queries. Views do not contain any data. Instead, the query stored by the
view is executed every time the view is referenced by another query. Views and
tables occupy the same namespace.
+Each compute engine stores the metadata of the view in its proprietary format
in the metastore of choice. Thus, views created from one engine can not be read
or altered easily from another engine even when engines share the metastore as
well as the storage system. This document standardizes the view metadata for
ease of sharing the views across engines.
+
+## Goals
+
+* A common metadata format for view metadata, similar to how Iceberg supports
a common table format for tables.
+* The view metadata format specification
+ * Includes storage format as well as APIs to write/read the metadata.
+ * Supports versioning of views to track how a view evolved over time.
+
+## Overview
+
+The view metadata storage and retrieval mirrors how Iceberg table metadata is
stored and retrieved. The view metadata is stored in a JSON file on object
storage for ease of tracking the evolution of the view. Metastore continues to
hold the view object with some properties such as database name, owner, create
time, last access time and an indication that the object is a view.
+
+Each ‘CREATE OR REPLACE VIEW’ statement creates a new view version metadata
file for that view.
+Each metadata file is self-sufficient. It contains the history of the last few
operations performed on the view and can be used to roll back the view to a
previous version.
+
+### Metadata Location
+
+The view metadata location is managed exactly like table metadata location.
+
+### Operations
+
+* Create a view
+* Drop the view
+* Load a view to read the metadata
+* Replace the view
+* Change the view definition
+* Add/delete/edit column comments
+
+## Specification
+
+### Terms
+
+* **Schema** -- Names and types of fields in a view.
+* **Version** -- The state of a view at some point in time.
+
+### View Metadata
+
+The view version metadata file has the following fields:
+
+| Required/Optional | Field Name | Description |
+|-------------------|------------|-------------|
+| Required | format-version | Json format version number for the view metadata
spec. The view metadata spec and the corresponding format-version is
independent of table spec. Starts with 1 and is incremented when there is a
breaking change to view metadata. |
+| Required | object-type | Type of object this metadata file is for:
"table" or "view". It must be set to "view" for all objects covered in this
spec. |
+| Required | location | Location of the view metadata files |
+| Required | current-version-id | Current version of the view. Set to ‘1’ when
the view is first created. |
+| Optional | properties | A string to string map of view properties. Contains
pre-set properties such as ‘comment’ describing the view, does not contain
arbitrary metadata. |
+| Required | versions | An array of structs describing the last few versions
of the view. Controlled by the table property: “version.history.num_entries”.
See more below. |
+| Required | version-log | An array of structs describing the log of created
versions. See more below. |
Review comment:
Could you make the reference more specific than "below"? You should be
able to link to a section and provide its name.
##########
File path: site/docs/view-spec.md
##########
@@ -0,0 +1,256 @@
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements. See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License. You may obtain a copy of the License at
+ -
+ - http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Iceberg View Spec
+
+## Background and Motivation
+
+Most compute engines (e.g. Trino and Apache Spark) support logical views,
commonly known as ‘views’. A view is a logical table that can be referenced by
future queries. Views do not contain any data. Instead, the query stored by the
view is executed every time the view is referenced by another query. Views and
tables occupy the same namespace.
+Each compute engine stores the metadata of the view in its proprietary format
in the metastore of choice. Thus, views created from one engine can not be read
or altered easily from another engine even when engines share the metastore as
well as the storage system. This document standardizes the view metadata for
ease of sharing the views across engines.
+
+## Goals
+
+* A common metadata format for view metadata, similar to how Iceberg supports
a common table format for tables.
+* The view metadata format specification
+ * Includes storage format as well as APIs to write/read the metadata.
+ * Supports versioning of views to track how a view evolved over time.
+
+## Overview
+
+The view metadata storage and retrieval mirrors how Iceberg table metadata is
stored and retrieved. The view metadata is stored in a JSON file on object
storage for ease of tracking the evolution of the view. Metastore continues to
hold the view object with some properties such as database name, owner, create
time, last access time and an indication that the object is a view.
+
+Each ‘CREATE OR REPLACE VIEW’ statement creates a new view version metadata
file for that view.
+Each metadata file is self-sufficient. It contains the history of the last few
operations performed on the view and can be used to roll back the view to a
previous version.
+
+### Metadata Location
+
+The view metadata location is managed exactly like table metadata location.
+
+### Operations
+
+* Create a view
+* Drop the view
+* Load a view to read the metadata
+* Replace the view
+* Change the view definition
+* Add/delete/edit column comments
+
+## Specification
+
+### Terms
+
+* **Schema** -- Names and types of fields in a view.
+* **Version** -- The state of a view at some point in time.
+
+### View Metadata
+
+The view version metadata file has the following fields:
+
+| Required/Optional | Field Name | Description |
+|-------------------|------------|-------------|
+| Required | format-version | Json format version number for the view metadata
spec. The view metadata spec and the corresponding format-version is
independent of table spec. Starts with 1 and is incremented when there is a
breaking change to view metadata. |
+| Required | object-type | Type of object this metadata file is for:
"table" or "view". It must be set to "view" for all objects covered in this
spec. |
+| Required | location | Location of the view metadata files |
+| Required | current-version-id | Current version of the view. Set to ‘1’ when
the view is first created. |
+| Optional | properties | A string to string map of view properties. Contains
pre-set properties such as ‘comment’ describing the view, does not contain
arbitrary metadata. |
+| Required | versions | An array of structs describing the last few versions
of the view. Controlled by the table property: “version.history.num_entries”.
See more below. |
+| Required | version-log | An array of structs describing the log of created
versions. See more below. |
+| Optional | schemas | A list of schemas, the same as the ‘schemas’ field from
Iceberg table spec. |
+| Optional | current-schema-id | ID of the current schema of the view |
+
+
+‘Versions’ is an array of structs with fields as shown below:
+
+| Required/Optional | Field Name | Description |
+|-------------------|------------|-------------|
+| Required | version-id | Monotonically increasing id indicating the version
of the view. Starts with “1”. |
Review comment:
Why does this place a requirement on version assignment?
I think you also mean to start with `1` rather than `"1"` because this is an
integer field.
##########
File path: site/docs/view-spec.md
##########
@@ -0,0 +1,256 @@
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements. See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License. You may obtain a copy of the License at
+ -
+ - http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Iceberg View Spec
+
+## Background and Motivation
+
+Most compute engines (e.g. Trino and Apache Spark) support logical views,
commonly known as ‘views’. A view is a logical table that can be referenced by
future queries. Views do not contain any data. Instead, the query stored by the
view is executed every time the view is referenced by another query. Views and
tables occupy the same namespace.
+Each compute engine stores the metadata of the view in its proprietary format
in the metastore of choice. Thus, views created from one engine can not be read
or altered easily from another engine even when engines share the metastore as
well as the storage system. This document standardizes the view metadata for
ease of sharing the views across engines.
+
+## Goals
+
+* A common metadata format for view metadata, similar to how Iceberg supports
a common table format for tables.
+* The view metadata format specification
+ * Includes storage format as well as APIs to write/read the metadata.
+ * Supports versioning of views to track how a view evolved over time.
+
+## Overview
+
+The view metadata storage and retrieval mirrors how Iceberg table metadata is
stored and retrieved. The view metadata is stored in a JSON file on object
storage for ease of tracking the evolution of the view. Metastore continues to
hold the view object with some properties such as database name, owner, create
time, last access time and an indication that the object is a view.
+
+Each ‘CREATE OR REPLACE VIEW’ statement creates a new view version metadata
file for that view.
+Each metadata file is self-sufficient. It contains the history of the last few
operations performed on the view and can be used to roll back the view to a
previous version.
+
+### Metadata Location
+
+The view metadata location is managed exactly like table metadata location.
+
+### Operations
+
+* Create a view
+* Drop the view
+* Load a view to read the metadata
+* Replace the view
+* Change the view definition
+* Add/delete/edit column comments
+
+## Specification
+
+### Terms
+
+* **Schema** -- Names and types of fields in a view.
+* **Version** -- The state of a view at some point in time.
+
+### View Metadata
+
+The view version metadata file has the following fields:
+
+| Required/Optional | Field Name | Description |
+|-------------------|------------|-------------|
+| Required | format-version | Json format version number for the view metadata
spec. The view metadata spec and the corresponding format-version is
independent of table spec. Starts with 1 and is incremented when there is a
breaking change to view metadata. |
+| Required | object-type | Type of object this metadata file is for:
"table" or "view". It must be set to "view" for all objects covered in this
spec. |
+| Required | location | Location of the view metadata files |
+| Required | current-version-id | Current version of the view. Set to ‘1’ when
the view is first created. |
+| Optional | properties | A string to string map of view properties. Contains
pre-set properties such as ‘comment’ describing the view, does not contain
arbitrary metadata. |
+| Required | versions | An array of structs describing the last few versions
of the view. Controlled by the table property: “version.history.num_entries”.
See more below. |
+| Required | version-log | An array of structs describing the log of created
versions. See more below. |
+| Optional | schemas | A list of schemas, the same as the ‘schemas’ field from
Iceberg table spec. |
+| Optional | current-schema-id | ID of the current schema of the view |
+
+
+‘Versions’ is an array of structs with fields as shown below:
+
+| Required/Optional | Field Name | Description |
+|-------------------|------------|-------------|
+| Required | version-id | Monotonically increasing id indicating the version
of the view. Starts with “1”. |
+| Required | timestamp-ms | Timestamp expressed in ms since epoch at which the
version of the view was created. |
+| Required | summary | A string to string map of view properties to track
version metadata. This field can be used by engines to store any necessary
properties. Two currently required properties are described below. |
+| Required | representations | A list of "representations" as described below.
|
+
+
+Note that each version is stored in a separate AVRO file. This is to ensure
that the metadata file stays readable in the case the view definition is huge.
As a future extension, an engine-agnostic intermediate representation or a
serialized abstract syntax tree of the SQL definition may also be stored in
each version, exacerbating the problem.
+
+“summary” is a string-string map with the following required keys. Engines may
store additional key-value pairs in this map.
+
+| Required/Optional | Key | Value |
+|-------------------|-----|-------|
+| Required | operation |A string value indicating the view operation that
caused this metadata to be created. Allowed values are “CREATE” and “REPLACE” |
Review comment:
Can you change the operations to lower case? That matches the
conventions in the table format.
##########
File path: site/docs/view-spec.md
##########
@@ -0,0 +1,256 @@
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements. See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License. You may obtain a copy of the License at
+ -
+ - http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Iceberg View Spec
+
+## Background and Motivation
+
+Most compute engines (e.g. Trino and Apache Spark) support logical views,
commonly known as ‘views’. A view is a logical table that can be referenced by
future queries. Views do not contain any data. Instead, the query stored by the
view is executed every time the view is referenced by another query. Views and
tables occupy the same namespace.
+Each compute engine stores the metadata of the view in its proprietary format
in the metastore of choice. Thus, views created from one engine can not be read
or altered easily from another engine even when engines share the metastore as
well as the storage system. This document standardizes the view metadata for
ease of sharing the views across engines.
+
+## Goals
+
+* A common metadata format for view metadata, similar to how Iceberg supports
a common table format for tables.
+* The view metadata format specification
+ * Includes storage format as well as APIs to write/read the metadata.
+ * Supports versioning of views to track how a view evolved over time.
+
+## Overview
+
+The view metadata storage and retrieval mirrors how Iceberg table metadata is
stored and retrieved. The view metadata is stored in a JSON file on object
storage for ease of tracking the evolution of the view. Metastore continues to
hold the view object with some properties such as database name, owner, create
time, last access time and an indication that the object is a view.
+
+Each ‘CREATE OR REPLACE VIEW’ statement creates a new view version metadata
file for that view.
+Each metadata file is self-sufficient. It contains the history of the last few
operations performed on the view and can be used to roll back the view to a
previous version.
+
+### Metadata Location
+
+The view metadata location is managed exactly like table metadata location.
+
+### Operations
+
+* Create a view
+* Drop the view
+* Load a view to read the metadata
+* Replace the view
+* Change the view definition
+* Add/delete/edit column comments
+
+## Specification
+
+### Terms
+
+* **Schema** -- Names and types of fields in a view.
+* **Version** -- The state of a view at some point in time.
+
+### View Metadata
+
+The view version metadata file has the following fields:
+
+| Required/Optional | Field Name | Description |
+|-------------------|------------|-------------|
+| Required | format-version | Json format version number for the view metadata
spec. The view metadata spec and the corresponding format-version is
independent of table spec. Starts with 1 and is incremented when there is a
breaking change to view metadata. |
+| Required | object-type | Type of object this metadata file is for:
"table" or "view". It must be set to "view" for all objects covered in this
spec. |
+| Required | location | Location of the view metadata files |
+| Required | current-version-id | Current version of the view. Set to ‘1’ when
the view is first created. |
+| Optional | properties | A string to string map of view properties. Contains
pre-set properties such as ‘comment’ describing the view, does not contain
arbitrary metadata. |
+| Required | versions | An array of structs describing the last few versions
of the view. Controlled by the table property: “version.history.num_entries”.
See more below. |
+| Required | version-log | An array of structs describing the log of created
versions. See more below. |
+| Optional | schemas | A list of schemas, the same as the ‘schemas’ field from
Iceberg table spec. |
+| Optional | current-schema-id | ID of the current schema of the view |
+
+
+‘Versions’ is an array of structs with fields as shown below:
+
+| Required/Optional | Field Name | Description |
+|-------------------|------------|-------------|
+| Required | version-id | Monotonically increasing id indicating the version
of the view. Starts with “1”. |
+| Required | timestamp-ms | Timestamp expressed in ms since epoch at which the
version of the view was created. |
+| Required | summary | A string to string map of view properties to track
version metadata. This field can be used by engines to store any necessary
properties. Two currently required properties are described below. |
+| Required | representations | A list of "representations" as described below.
|
+
+
+Note that each version is stored in a separate AVRO file. This is to ensure
that the metadata file stays readable in the case the view definition is huge.
As a future extension, an engine-agnostic intermediate representation or a
serialized abstract syntax tree of the SQL definition may also be stored in
each version, exacerbating the problem.
+
+“summary” is a string-string map with the following required keys. Engines may
store additional key-value pairs in this map.
Review comment:
Only `operation` is required below. I think it is good that this notes
whether each key is required or optional and defines common properties here.
##########
File path: site/docs/view-spec.md
##########
@@ -0,0 +1,256 @@
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements. See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License. You may obtain a copy of the License at
+ -
+ - http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Iceberg View Spec
+
+## Background and Motivation
+
+Most compute engines (e.g. Trino and Apache Spark) support logical views,
commonly known as ‘views’. A view is a logical table that can be referenced by
future queries. Views do not contain any data. Instead, the query stored by the
view is executed every time the view is referenced by another query. Views and
tables occupy the same namespace.
+Each compute engine stores the metadata of the view in its proprietary format
in the metastore of choice. Thus, views created from one engine can not be read
or altered easily from another engine even when engines share the metastore as
well as the storage system. This document standardizes the view metadata for
ease of sharing the views across engines.
+
+## Goals
+
+* A common metadata format for view metadata, similar to how Iceberg supports
a common table format for tables.
+* The view metadata format specification
+ * Includes storage format as well as APIs to write/read the metadata.
+ * Supports versioning of views to track how a view evolved over time.
+
+## Overview
+
+The view metadata storage and retrieval mirrors how Iceberg table metadata is
stored and retrieved. The view metadata is stored in a JSON file on object
storage for ease of tracking the evolution of the view. Metastore continues to
hold the view object with some properties such as database name, owner, create
time, last access time and an indication that the object is a view.
+
+Each ‘CREATE OR REPLACE VIEW’ statement creates a new view version metadata
file for that view.
+Each metadata file is self-sufficient. It contains the history of the last few
operations performed on the view and can be used to roll back the view to a
previous version.
+
+### Metadata Location
+
+The view metadata location is managed exactly like table metadata location.
+
+### Operations
+
+* Create a view
+* Drop the view
+* Load a view to read the metadata
+* Replace the view
+* Change the view definition
+* Add/delete/edit column comments
+
+## Specification
+
+### Terms
+
+* **Schema** -- Names and types of fields in a view.
+* **Version** -- The state of a view at some point in time.
+
+### View Metadata
+
+The view version metadata file has the following fields:
+
+| Required/Optional | Field Name | Description |
+|-------------------|------------|-------------|
+| Required | format-version | Json format version number for the view metadata
spec. The view metadata spec and the corresponding format-version is
independent of table spec. Starts with 1 and is incremented when there is a
breaking change to view metadata. |
+| Required | object-type | Type of object this metadata file is for:
"table" or "view". It must be set to "view" for all objects covered in this
spec. |
+| Required | location | Location of the view metadata files |
+| Required | current-version-id | Current version of the view. Set to ‘1’ when
the view is first created. |
+| Optional | properties | A string to string map of view properties. Contains
pre-set properties such as ‘comment’ describing the view, does not contain
arbitrary metadata. |
+| Required | versions | An array of structs describing the last few versions
of the view. Controlled by the table property: “version.history.num_entries”.
See more below. |
+| Required | version-log | An array of structs describing the log of created
versions. See more below. |
+| Optional | schemas | A list of schemas, the same as the ‘schemas’ field from
Iceberg table spec. |
+| Optional | current-schema-id | ID of the current schema of the view |
+
+
+‘Versions’ is an array of structs with fields as shown below:
+
+| Required/Optional | Field Name | Description |
+|-------------------|------------|-------------|
+| Required | version-id | Monotonically increasing id indicating the version
of the view. Starts with “1”. |
+| Required | timestamp-ms | Timestamp expressed in ms since epoch at which the
version of the view was created. |
+| Required | summary | A string to string map of view properties to track
version metadata. This field can be used by engines to store any necessary
properties. Two currently required properties are described below. |
+| Required | representations | A list of "representations" as described below.
|
+
+
+Note that each version is stored in a separate AVRO file. This is to ensure
that the metadata file stays readable in the case the view definition is huge.
As a future extension, an engine-agnostic intermediate representation or a
serialized abstract syntax tree of the SQL definition may also be stored in
each version, exacerbating the problem.
Review comment:
I don't think this is accurate, and if it is then it probably doesn't
belong here. It is perfectly reasonable to store certain representations in the
metadata file and reasonable to store future representations in separate files.
I think for now you want to just specify what the current set of
representations can do.
##########
File path: site/docs/view-spec.md
##########
@@ -0,0 +1,256 @@
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements. See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License. You may obtain a copy of the License at
+ -
+ - http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Iceberg View Spec
+
+## Background and Motivation
+
+Most compute engines (e.g. Trino and Apache Spark) support logical views,
commonly known as ‘views’. A view is a logical table that can be referenced by
future queries. Views do not contain any data. Instead, the query stored by the
view is executed every time the view is referenced by another query. Views and
tables occupy the same namespace.
+Each compute engine stores the metadata of the view in its proprietary format
in the metastore of choice. Thus, views created from one engine can not be read
or altered easily from another engine even when engines share the metastore as
well as the storage system. This document standardizes the view metadata for
ease of sharing the views across engines.
+
+## Goals
+
+* A common metadata format for view metadata, similar to how Iceberg supports
a common table format for tables.
+* The view metadata format specification
+ * Includes storage format as well as APIs to write/read the metadata.
+ * Supports versioning of views to track how a view evolved over time.
+
+## Overview
+
+The view metadata storage and retrieval mirrors how Iceberg table metadata is
stored and retrieved. The view metadata is stored in a JSON file on object
storage for ease of tracking the evolution of the view. Metastore continues to
hold the view object with some properties such as database name, owner, create
time, last access time and an indication that the object is a view.
+
+Each ‘CREATE OR REPLACE VIEW’ statement creates a new view version metadata
file for that view.
+Each metadata file is self-sufficient. It contains the history of the last few
operations performed on the view and can be used to roll back the view to a
previous version.
+
+### Metadata Location
+
+The view metadata location is managed exactly like table metadata location.
+
+### Operations
+
+* Create a view
+* Drop the view
+* Load a view to read the metadata
+* Replace the view
+* Change the view definition
+* Add/delete/edit column comments
+
+## Specification
+
+### Terms
+
+* **Schema** -- Names and types of fields in a view.
+* **Version** -- The state of a view at some point in time.
+
+### View Metadata
+
+The view version metadata file has the following fields:
+
+| Required/Optional | Field Name | Description |
+|-------------------|------------|-------------|
+| Required | format-version | Json format version number for the view metadata
spec. The view metadata spec and the corresponding format-version is
independent of table spec. Starts with 1 and is incremented when there is a
breaking change to view metadata. |
+| Required | object-type | Type of object this metadata file is for:
"table" or "view". It must be set to "view" for all objects covered in this
spec. |
+| Required | location | Location of the view metadata files |
+| Required | current-version-id | Current version of the view. Set to ‘1’ when
the view is first created. |
+| Optional | properties | A string to string map of view properties. Contains
pre-set properties such as ‘comment’ describing the view, does not contain
arbitrary metadata. |
+| Required | versions | An array of structs describing the last few versions
of the view. Controlled by the table property: “version.history.num_entries”.
See more below. |
+| Required | version-log | An array of structs describing the log of created
versions. See more below. |
+| Optional | schemas | A list of schemas, the same as the ‘schemas’ field from
Iceberg table spec. |
+| Optional | current-schema-id | ID of the current schema of the view |
+
+
+‘Versions’ is an array of structs with fields as shown below:
+
+| Required/Optional | Field Name | Description |
+|-------------------|------------|-------------|
+| Required | version-id | Monotonically increasing id indicating the version
of the view. Starts with “1”. |
+| Required | timestamp-ms | Timestamp expressed in ms since epoch at which the
version of the view was created. |
+| Required | summary | A string to string map of view properties to track
version metadata. This field can be used by engines to store any necessary
properties. Two currently required properties are described below. |
+| Required | representations | A list of "representations" as described below.
|
+
+
+Note that each version is stored in a separate AVRO file. This is to ensure
that the metadata file stays readable in the case the view definition is huge.
As a future extension, an engine-agnostic intermediate representation or a
serialized abstract syntax tree of the SQL definition may also be stored in
each version, exacerbating the problem.
+
+“summary” is a string-string map with the following required keys. Engines may
store additional key-value pairs in this map.
+
+| Required/Optional | Key | Value |
+|-------------------|-----|-------|
+| Required | operation |A string value indicating the view operation that
caused this metadata to be created. Allowed values are “CREATE” and “REPLACE” |
+| Optional | engine-version | A string value indicating the version of the
engine that performed the operation (create / replace) |
+
+“representations” is a list of structs with fields as shown below:
+
+| Required/Optional | Field Name | Description |
+|-------------------|------------|-------------|
+| Required | type | A string indicating the type of representation. The only
valid choice is "sql". The rest of the fields are interpreted by the type of
representation. |
+
+Here are the fields for "sql" representation type:
Review comment:
It isn't clear from the two tables. Are these all in the same
"representation" struct? If so, then it's not really a struct. It's more of an
object because the fields can vary.
##########
File path: site/docs/view-spec.md
##########
@@ -0,0 +1,256 @@
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements. See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License. You may obtain a copy of the License at
+ -
+ - http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Iceberg View Spec
+
+## Background and Motivation
+
+Most compute engines (e.g. Trino and Apache Spark) support logical views,
commonly known as ‘views’. A view is a logical table that can be referenced by
future queries. Views do not contain any data. Instead, the query stored by the
view is executed every time the view is referenced by another query. Views and
tables occupy the same namespace.
+Each compute engine stores the metadata of the view in its proprietary format
in the metastore of choice. Thus, views created from one engine can not be read
or altered easily from another engine even when engines share the metastore as
well as the storage system. This document standardizes the view metadata for
ease of sharing the views across engines.
+
+## Goals
+
+* A common metadata format for view metadata, similar to how Iceberg supports
a common table format for tables.
+* The view metadata format specification
+ * Includes storage format as well as APIs to write/read the metadata.
+ * Supports versioning of views to track how a view evolved over time.
+
+## Overview
+
+The view metadata storage and retrieval mirrors how Iceberg table metadata is
stored and retrieved. The view metadata is stored in a JSON file on object
storage for ease of tracking the evolution of the view. Metastore continues to
hold the view object with some properties such as database name, owner, create
time, last access time and an indication that the object is a view.
+
+Each ‘CREATE OR REPLACE VIEW’ statement creates a new view version metadata
file for that view.
+Each metadata file is self-sufficient. It contains the history of the last few
operations performed on the view and can be used to roll back the view to a
previous version.
+
+### Metadata Location
+
+The view metadata location is managed exactly like table metadata location.
+
+### Operations
+
+* Create a view
+* Drop the view
+* Load a view to read the metadata
+* Replace the view
+* Change the view definition
+* Add/delete/edit column comments
+
+## Specification
+
+### Terms
+
+* **Schema** -- Names and types of fields in a view.
+* **Version** -- The state of a view at some point in time.
+
+### View Metadata
+
+The view version metadata file has the following fields:
+
+| Required/Optional | Field Name | Description |
+|-------------------|------------|-------------|
+| Required | format-version | Json format version number for the view metadata
spec. The view metadata spec and the corresponding format-version is
independent of table spec. Starts with 1 and is incremented when there is a
breaking change to view metadata. |
+| Required | object-type | Type of object this metadata file is for:
"table" or "view". It must be set to "view" for all objects covered in this
spec. |
+| Required | location | Location of the view metadata files |
+| Required | current-version-id | Current version of the view. Set to ‘1’ when
the view is first created. |
+| Optional | properties | A string to string map of view properties. Contains
pre-set properties such as ‘comment’ describing the view, does not contain
arbitrary metadata. |
+| Required | versions | An array of structs describing the last few versions
of the view. Controlled by the table property: “version.history.num_entries”.
See more below. |
+| Required | version-log | An array of structs describing the log of created
versions. See more below. |
+| Optional | schemas | A list of schemas, the same as the ‘schemas’ field from
Iceberg table spec. |
+| Optional | current-schema-id | ID of the current schema of the view |
+
+
+‘Versions’ is an array of structs with fields as shown below:
+
+| Required/Optional | Field Name | Description |
+|-------------------|------------|-------------|
+| Required | version-id | Monotonically increasing id indicating the version
of the view. Starts with “1”. |
+| Required | timestamp-ms | Timestamp expressed in ms since epoch at which the
version of the view was created. |
+| Required | summary | A string to string map of view properties to track
version metadata. This field can be used by engines to store any necessary
properties. Two currently required properties are described below. |
+| Required | representations | A list of "representations" as described below.
|
+
+
+Note that each version is stored in a separate AVRO file. This is to ensure
that the metadata file stays readable in the case the view definition is huge.
As a future extension, an engine-agnostic intermediate representation or a
serialized abstract syntax tree of the SQL definition may also be stored in
each version, exacerbating the problem.
+
+“summary” is a string-string map with the following required keys. Engines may
store additional key-value pairs in this map.
+
+| Required/Optional | Key | Value |
+|-------------------|-----|-------|
+| Required | operation |A string value indicating the view operation that
caused this metadata to be created. Allowed values are “CREATE” and “REPLACE” |
+| Optional | engine-version | A string value indicating the version of the
engine that performed the operation (create / replace) |
+
+“representations” is a list of structs with fields as shown below:
+
+| Required/Optional | Field Name | Description |
+|-------------------|------------|-------------|
+| Required | type | A string indicating the type of representation. The only
valid choice is "sql". The rest of the fields are interpreted by the type of
representation. |
+
+Here are the fields for "sql" representation type:
Review comment:
It isn't clear from the two tables. Are these all in the same
"representation" struct? If so, then it's not really a struct. It's more of an
object because the fields can vary. I think it would be more clear if you had a
section for reach representation and make is clear that the `representations`
list is actually a list of unions.
##########
File path: site/docs/view-spec.md
##########
@@ -0,0 +1,256 @@
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements. See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License. You may obtain a copy of the License at
+ -
+ - http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Iceberg View Spec
+
+## Background and Motivation
+
+Most compute engines (e.g. Trino and Apache Spark) support logical views,
commonly known as ‘views’. A view is a logical table that can be referenced by
future queries. Views do not contain any data. Instead, the query stored by the
view is executed every time the view is referenced by another query. Views and
tables occupy the same namespace.
+Each compute engine stores the metadata of the view in its proprietary format
in the metastore of choice. Thus, views created from one engine can not be read
or altered easily from another engine even when engines share the metastore as
well as the storage system. This document standardizes the view metadata for
ease of sharing the views across engines.
+
+## Goals
+
+* A common metadata format for view metadata, similar to how Iceberg supports
a common table format for tables.
+* The view metadata format specification
+ * Includes storage format as well as APIs to write/read the metadata.
+ * Supports versioning of views to track how a view evolved over time.
+
+## Overview
+
+The view metadata storage and retrieval mirrors how Iceberg table metadata is
stored and retrieved. The view metadata is stored in a JSON file on object
storage for ease of tracking the evolution of the view. Metastore continues to
hold the view object with some properties such as database name, owner, create
time, last access time and an indication that the object is a view.
+
+Each ‘CREATE OR REPLACE VIEW’ statement creates a new view version metadata
file for that view.
+Each metadata file is self-sufficient. It contains the history of the last few
operations performed on the view and can be used to roll back the view to a
previous version.
+
+### Metadata Location
+
+The view metadata location is managed exactly like table metadata location.
+
+### Operations
+
+* Create a view
+* Drop the view
+* Load a view to read the metadata
+* Replace the view
+* Change the view definition
+* Add/delete/edit column comments
+
+## Specification
+
+### Terms
+
+* **Schema** -- Names and types of fields in a view.
+* **Version** -- The state of a view at some point in time.
+
+### View Metadata
+
+The view version metadata file has the following fields:
+
+| Required/Optional | Field Name | Description |
+|-------------------|------------|-------------|
+| Required | format-version | Json format version number for the view metadata
spec. The view metadata spec and the corresponding format-version is
independent of table spec. Starts with 1 and is incremented when there is a
breaking change to view metadata. |
+| Required | object-type | Type of object this metadata file is for:
"table" or "view". It must be set to "view" for all objects covered in this
spec. |
+| Required | location | Location of the view metadata files |
+| Required | current-version-id | Current version of the view. Set to ‘1’ when
the view is first created. |
+| Optional | properties | A string to string map of view properties. Contains
pre-set properties such as ‘comment’ describing the view, does not contain
arbitrary metadata. |
+| Required | versions | An array of structs describing the last few versions
of the view. Controlled by the table property: “version.history.num_entries”.
See more below. |
+| Required | version-log | An array of structs describing the log of created
versions. See more below. |
+| Optional | schemas | A list of schemas, the same as the ‘schemas’ field from
Iceberg table spec. |
+| Optional | current-schema-id | ID of the current schema of the view |
+
+
+‘Versions’ is an array of structs with fields as shown below:
+
+| Required/Optional | Field Name | Description |
+|-------------------|------------|-------------|
+| Required | version-id | Monotonically increasing id indicating the version
of the view. Starts with “1”. |
+| Required | timestamp-ms | Timestamp expressed in ms since epoch at which the
version of the view was created. |
+| Required | summary | A string to string map of view properties to track
version metadata. This field can be used by engines to store any necessary
properties. Two currently required properties are described below. |
+| Required | representations | A list of "representations" as described below.
|
+
+
+Note that each version is stored in a separate AVRO file. This is to ensure
that the metadata file stays readable in the case the view definition is huge.
As a future extension, an engine-agnostic intermediate representation or a
serialized abstract syntax tree of the SQL definition may also be stored in
each version, exacerbating the problem.
+
+“summary” is a string-string map with the following required keys. Engines may
store additional key-value pairs in this map.
+
+| Required/Optional | Key | Value |
+|-------------------|-----|-------|
+| Required | operation |A string value indicating the view operation that
caused this metadata to be created. Allowed values are “CREATE” and “REPLACE” |
+| Optional | engine-version | A string value indicating the version of the
engine that performed the operation (create / replace) |
+
+“representations” is a list of structs with fields as shown below:
+
+| Required/Optional | Field Name | Description |
+|-------------------|------------|-------------|
+| Required | type | A string indicating the type of representation. The only
valid choice is "sql". The rest of the fields are interpreted by the type of
representation. |
+
+Here are the fields for "sql" representation type:
+
+| Required/Optional | Field Name | Description |
+|-------------------|------------|-------------|
+| Required | type | It must be set to "sql" |
Review comment:
I think this should still explain what `type` is, not just state the
required value. How about "The format used to store this representation of the
view"
##########
File path: site/docs/view-spec.md
##########
@@ -0,0 +1,256 @@
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements. See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License. You may obtain a copy of the License at
+ -
+ - http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Iceberg View Spec
+
+## Background and Motivation
+
+Most compute engines (e.g. Trino and Apache Spark) support logical views,
commonly known as ‘views’. A view is a logical table that can be referenced by
future queries. Views do not contain any data. Instead, the query stored by the
view is executed every time the view is referenced by another query. Views and
tables occupy the same namespace.
+Each compute engine stores the metadata of the view in its proprietary format
in the metastore of choice. Thus, views created from one engine can not be read
or altered easily from another engine even when engines share the metastore as
well as the storage system. This document standardizes the view metadata for
ease of sharing the views across engines.
+
+## Goals
+
+* A common metadata format for view metadata, similar to how Iceberg supports
a common table format for tables.
+* The view metadata format specification
+ * Includes storage format as well as APIs to write/read the metadata.
+ * Supports versioning of views to track how a view evolved over time.
+
+## Overview
+
+The view metadata storage and retrieval mirrors how Iceberg table metadata is
stored and retrieved. The view metadata is stored in a JSON file on object
storage for ease of tracking the evolution of the view. Metastore continues to
hold the view object with some properties such as database name, owner, create
time, last access time and an indication that the object is a view.
+
+Each ‘CREATE OR REPLACE VIEW’ statement creates a new view version metadata
file for that view.
+Each metadata file is self-sufficient. It contains the history of the last few
operations performed on the view and can be used to roll back the view to a
previous version.
+
+### Metadata Location
+
+The view metadata location is managed exactly like table metadata location.
+
+### Operations
+
+* Create a view
+* Drop the view
+* Load a view to read the metadata
+* Replace the view
+* Change the view definition
+* Add/delete/edit column comments
+
+## Specification
+
+### Terms
+
+* **Schema** -- Names and types of fields in a view.
+* **Version** -- The state of a view at some point in time.
+
+### View Metadata
+
+The view version metadata file has the following fields:
+
+| Required/Optional | Field Name | Description |
+|-------------------|------------|-------------|
+| Required | format-version | Json format version number for the view metadata
spec. The view metadata spec and the corresponding format-version is
independent of table spec. Starts with 1 and is incremented when there is a
breaking change to view metadata. |
+| Required | object-type | Type of object this metadata file is for:
"table" or "view". It must be set to "view" for all objects covered in this
spec. |
+| Required | location | Location of the view metadata files |
+| Required | current-version-id | Current version of the view. Set to ‘1’ when
the view is first created. |
+| Optional | properties | A string to string map of view properties. Contains
pre-set properties such as ‘comment’ describing the view, does not contain
arbitrary metadata. |
+| Required | versions | An array of structs describing the last few versions
of the view. Controlled by the table property: “version.history.num_entries”.
See more below. |
+| Required | version-log | An array of structs describing the log of created
versions. See more below. |
+| Optional | schemas | A list of schemas, the same as the ‘schemas’ field from
Iceberg table spec. |
+| Optional | current-schema-id | ID of the current schema of the view |
+
+
+‘Versions’ is an array of structs with fields as shown below:
+
+| Required/Optional | Field Name | Description |
+|-------------------|------------|-------------|
+| Required | version-id | Monotonically increasing id indicating the version
of the view. Starts with “1”. |
+| Required | timestamp-ms | Timestamp expressed in ms since epoch at which the
version of the view was created. |
+| Required | summary | A string to string map of view properties to track
version metadata. This field can be used by engines to store any necessary
properties. Two currently required properties are described below. |
+| Required | representations | A list of "representations" as described below.
|
+
+
+Note that each version is stored in a separate AVRO file. This is to ensure
that the metadata file stays readable in the case the view definition is huge.
As a future extension, an engine-agnostic intermediate representation or a
serialized abstract syntax tree of the SQL definition may also be stored in
each version, exacerbating the problem.
+
+“summary” is a string-string map with the following required keys. Engines may
store additional key-value pairs in this map.
+
+| Required/Optional | Key | Value |
+|-------------------|-----|-------|
+| Required | operation |A string value indicating the view operation that
caused this metadata to be created. Allowed values are “CREATE” and “REPLACE” |
+| Optional | engine-version | A string value indicating the version of the
engine that performed the operation (create / replace) |
+
+“representations” is a list of structs with fields as shown below:
+
+| Required/Optional | Field Name | Description |
+|-------------------|------------|-------------|
+| Required | type | A string indicating the type of representation. The only
valid choice is "sql". The rest of the fields are interpreted by the type of
representation. |
+
+Here are the fields for "sql" representation type:
+
+| Required/Optional | Field Name | Description |
+|-------------------|------------|-------------|
+| Required | type | It must be set to "sql" |
+| Required | sql | A string representing SQL definition of the view as input |
+| Required | dialect | A string specifying the dialect of the ‘sql’ field
above. Used by engines to perform necessary translations to the SQL dialect
supported by the engine. |
Review comment:
This description is too strict. I think it should state that this _can_
be used by engines. If it state just that it _is_ used to perform "necessary
translations" then that raises a lot of questions about what translations are
considered "necessary" by the spec. That's not a direction we want to head so
it should be clear that this is informational.
##########
File path: site/docs/view-spec.md
##########
@@ -0,0 +1,256 @@
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements. See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License. You may obtain a copy of the License at
+ -
+ - http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Iceberg View Spec
+
+## Background and Motivation
+
+Most compute engines (e.g. Trino and Apache Spark) support logical views,
commonly known as ‘views’. A view is a logical table that can be referenced by
future queries. Views do not contain any data. Instead, the query stored by the
view is executed every time the view is referenced by another query. Views and
tables occupy the same namespace.
+Each compute engine stores the metadata of the view in its proprietary format
in the metastore of choice. Thus, views created from one engine can not be read
or altered easily from another engine even when engines share the metastore as
well as the storage system. This document standardizes the view metadata for
ease of sharing the views across engines.
+
+## Goals
+
+* A common metadata format for view metadata, similar to how Iceberg supports
a common table format for tables.
+* The view metadata format specification
+ * Includes storage format as well as APIs to write/read the metadata.
+ * Supports versioning of views to track how a view evolved over time.
+
+## Overview
+
+The view metadata storage and retrieval mirrors how Iceberg table metadata is
stored and retrieved. The view metadata is stored in a JSON file on object
storage for ease of tracking the evolution of the view. Metastore continues to
hold the view object with some properties such as database name, owner, create
time, last access time and an indication that the object is a view.
+
+Each ‘CREATE OR REPLACE VIEW’ statement creates a new view version metadata
file for that view.
+Each metadata file is self-sufficient. It contains the history of the last few
operations performed on the view and can be used to roll back the view to a
previous version.
+
+### Metadata Location
+
+The view metadata location is managed exactly like table metadata location.
+
+### Operations
+
+* Create a view
+* Drop the view
+* Load a view to read the metadata
+* Replace the view
+* Change the view definition
+* Add/delete/edit column comments
+
+## Specification
+
+### Terms
+
+* **Schema** -- Names and types of fields in a view.
+* **Version** -- The state of a view at some point in time.
+
+### View Metadata
+
+The view version metadata file has the following fields:
+
+| Required/Optional | Field Name | Description |
+|-------------------|------------|-------------|
+| Required | format-version | Json format version number for the view metadata
spec. The view metadata spec and the corresponding format-version is
independent of table spec. Starts with 1 and is incremented when there is a
breaking change to view metadata. |
+| Required | object-type | Type of object this metadata file is for:
"table" or "view". It must be set to "view" for all objects covered in this
spec. |
+| Required | location | Location of the view metadata files |
+| Required | current-version-id | Current version of the view. Set to ‘1’ when
the view is first created. |
+| Optional | properties | A string to string map of view properties. Contains
pre-set properties such as ‘comment’ describing the view, does not contain
arbitrary metadata. |
+| Required | versions | An array of structs describing the last few versions
of the view. Controlled by the table property: “version.history.num_entries”.
See more below. |
+| Required | version-log | An array of structs describing the log of created
versions. See more below. |
+| Optional | schemas | A list of schemas, the same as the ‘schemas’ field from
Iceberg table spec. |
+| Optional | current-schema-id | ID of the current schema of the view |
+
+
+‘Versions’ is an array of structs with fields as shown below:
+
+| Required/Optional | Field Name | Description |
+|-------------------|------------|-------------|
+| Required | version-id | Monotonically increasing id indicating the version
of the view. Starts with “1”. |
+| Required | timestamp-ms | Timestamp expressed in ms since epoch at which the
version of the view was created. |
+| Required | summary | A string to string map of view properties to track
version metadata. This field can be used by engines to store any necessary
properties. Two currently required properties are described below. |
+| Required | representations | A list of "representations" as described below.
|
+
+
+Note that each version is stored in a separate AVRO file. This is to ensure
that the metadata file stays readable in the case the view definition is huge.
As a future extension, an engine-agnostic intermediate representation or a
serialized abstract syntax tree of the SQL definition may also be stored in
each version, exacerbating the problem.
+
+“summary” is a string-string map with the following required keys. Engines may
store additional key-value pairs in this map.
+
+| Required/Optional | Key | Value |
+|-------------------|-----|-------|
+| Required | operation |A string value indicating the view operation that
caused this metadata to be created. Allowed values are “CREATE” and “REPLACE” |
+| Optional | engine-version | A string value indicating the version of the
engine that performed the operation (create / replace) |
+
+“representations” is a list of structs with fields as shown below:
+
+| Required/Optional | Field Name | Description |
+|-------------------|------------|-------------|
+| Required | type | A string indicating the type of representation. The only
valid choice is "sql". The rest of the fields are interpreted by the type of
representation. |
+
+Here are the fields for "sql" representation type:
+
+| Required/Optional | Field Name | Description |
+|-------------------|------------|-------------|
+| Required | type | It must be set to "sql" |
+| Required | sql | A string representing SQL definition of the view as input |
+| Required | dialect | A string specifying the dialect of the ‘sql’ field
above. Used by engines to perform necessary translations to the SQL dialect
supported by the engine. |
+| Optional | session-catalog | A string that specifies the catalog of the user
session when the view was created / replaced. Used to resolve the tables in the
view definition. |
Review comment:
I think this should be more direct. It doesn't matter what the user's
session catalog was. What matters is what catalog should be used when table or
view references do not contain an explicit catalog.
##########
File path: site/docs/view-spec.md
##########
@@ -0,0 +1,256 @@
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements. See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License. You may obtain a copy of the License at
+ -
+ - http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Iceberg View Spec
+
+## Background and Motivation
+
+Most compute engines (e.g. Trino and Apache Spark) support logical views,
commonly known as ‘views’. A view is a logical table that can be referenced by
future queries. Views do not contain any data. Instead, the query stored by the
view is executed every time the view is referenced by another query. Views and
tables occupy the same namespace.
+Each compute engine stores the metadata of the view in its proprietary format
in the metastore of choice. Thus, views created from one engine can not be read
or altered easily from another engine even when engines share the metastore as
well as the storage system. This document standardizes the view metadata for
ease of sharing the views across engines.
+
+## Goals
+
+* A common metadata format for view metadata, similar to how Iceberg supports
a common table format for tables.
+* The view metadata format specification
+ * Includes storage format as well as APIs to write/read the metadata.
+ * Supports versioning of views to track how a view evolved over time.
+
+## Overview
+
+The view metadata storage and retrieval mirrors how Iceberg table metadata is
stored and retrieved. The view metadata is stored in a JSON file on object
storage for ease of tracking the evolution of the view. Metastore continues to
hold the view object with some properties such as database name, owner, create
time, last access time and an indication that the object is a view.
+
+Each ‘CREATE OR REPLACE VIEW’ statement creates a new view version metadata
file for that view.
+Each metadata file is self-sufficient. It contains the history of the last few
operations performed on the view and can be used to roll back the view to a
previous version.
+
+### Metadata Location
+
+The view metadata location is managed exactly like table metadata location.
+
+### Operations
+
+* Create a view
+* Drop the view
+* Load a view to read the metadata
+* Replace the view
+* Change the view definition
+* Add/delete/edit column comments
+
+## Specification
+
+### Terms
+
+* **Schema** -- Names and types of fields in a view.
+* **Version** -- The state of a view at some point in time.
+
+### View Metadata
+
+The view version metadata file has the following fields:
+
+| Required/Optional | Field Name | Description |
+|-------------------|------------|-------------|
+| Required | format-version | Json format version number for the view metadata
spec. The view metadata spec and the corresponding format-version is
independent of table spec. Starts with 1 and is incremented when there is a
breaking change to view metadata. |
+| Required | object-type | Type of object this metadata file is for:
"table" or "view". It must be set to "view" for all objects covered in this
spec. |
+| Required | location | Location of the view metadata files |
+| Required | current-version-id | Current version of the view. Set to ‘1’ when
the view is first created. |
+| Optional | properties | A string to string map of view properties. Contains
pre-set properties such as ‘comment’ describing the view, does not contain
arbitrary metadata. |
+| Required | versions | An array of structs describing the last few versions
of the view. Controlled by the table property: “version.history.num_entries”.
See more below. |
+| Required | version-log | An array of structs describing the log of created
versions. See more below. |
+| Optional | schemas | A list of schemas, the same as the ‘schemas’ field from
Iceberg table spec. |
+| Optional | current-schema-id | ID of the current schema of the view |
+
+
+‘Versions’ is an array of structs with fields as shown below:
+
+| Required/Optional | Field Name | Description |
+|-------------------|------------|-------------|
+| Required | version-id | Monotonically increasing id indicating the version
of the view. Starts with “1”. |
+| Required | timestamp-ms | Timestamp expressed in ms since epoch at which the
version of the view was created. |
+| Required | summary | A string to string map of view properties to track
version metadata. This field can be used by engines to store any necessary
properties. Two currently required properties are described below. |
+| Required | representations | A list of "representations" as described below.
|
+
+
+Note that each version is stored in a separate AVRO file. This is to ensure
that the metadata file stays readable in the case the view definition is huge.
As a future extension, an engine-agnostic intermediate representation or a
serialized abstract syntax tree of the SQL definition may also be stored in
each version, exacerbating the problem.
+
+“summary” is a string-string map with the following required keys. Engines may
store additional key-value pairs in this map.
+
+| Required/Optional | Key | Value |
+|-------------------|-----|-------|
+| Required | operation |A string value indicating the view operation that
caused this metadata to be created. Allowed values are “CREATE” and “REPLACE” |
+| Optional | engine-version | A string value indicating the version of the
engine that performed the operation (create / replace) |
+
+“representations” is a list of structs with fields as shown below:
+
+| Required/Optional | Field Name | Description |
+|-------------------|------------|-------------|
+| Required | type | A string indicating the type of representation. The only
valid choice is "sql". The rest of the fields are interpreted by the type of
representation. |
+
+Here are the fields for "sql" representation type:
+
+| Required/Optional | Field Name | Description |
+|-------------------|------------|-------------|
+| Required | type | It must be set to "sql" |
+| Required | sql | A string representing SQL definition of the view as input |
+| Required | dialect | A string specifying the dialect of the ‘sql’ field
above. Used by engines to perform necessary translations to the SQL dialect
supported by the engine. |
+| Optional | session-catalog | A string that specifies the catalog of the user
session when the view was created / replaced. Used to resolve the tables in the
view definition. |
+| Optional | session-namespace | An array of strings indicating namespace at
the time view was created / replaced. Used similar to ‘session-catalog’ above. |
Review comment:
Similar to session catalog, I think this should not state anything about
when it was created and should instead state the requirements for using this.
##########
File path: site/docs/view-spec.md
##########
@@ -0,0 +1,256 @@
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements. See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License. You may obtain a copy of the License at
+ -
+ - http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Iceberg View Spec
+
+## Background and Motivation
+
+Most compute engines (e.g. Trino and Apache Spark) support logical views,
commonly known as ‘views’. A view is a logical table that can be referenced by
future queries. Views do not contain any data. Instead, the query stored by the
view is executed every time the view is referenced by another query. Views and
tables occupy the same namespace.
+Each compute engine stores the metadata of the view in its proprietary format
in the metastore of choice. Thus, views created from one engine can not be read
or altered easily from another engine even when engines share the metastore as
well as the storage system. This document standardizes the view metadata for
ease of sharing the views across engines.
+
+## Goals
+
+* A common metadata format for view metadata, similar to how Iceberg supports
a common table format for tables.
+* The view metadata format specification
+ * Includes storage format as well as APIs to write/read the metadata.
+ * Supports versioning of views to track how a view evolved over time.
+
+## Overview
+
+The view metadata storage and retrieval mirrors how Iceberg table metadata is
stored and retrieved. The view metadata is stored in a JSON file on object
storage for ease of tracking the evolution of the view. Metastore continues to
hold the view object with some properties such as database name, owner, create
time, last access time and an indication that the object is a view.
+
+Each ‘CREATE OR REPLACE VIEW’ statement creates a new view version metadata
file for that view.
+Each metadata file is self-sufficient. It contains the history of the last few
operations performed on the view and can be used to roll back the view to a
previous version.
+
+### Metadata Location
+
+The view metadata location is managed exactly like table metadata location.
+
+### Operations
+
+* Create a view
+* Drop the view
+* Load a view to read the metadata
+* Replace the view
+* Change the view definition
+* Add/delete/edit column comments
+
+## Specification
+
+### Terms
+
+* **Schema** -- Names and types of fields in a view.
+* **Version** -- The state of a view at some point in time.
+
+### View Metadata
+
+The view version metadata file has the following fields:
+
+| Required/Optional | Field Name | Description |
+|-------------------|------------|-------------|
+| Required | format-version | Json format version number for the view metadata
spec. The view metadata spec and the corresponding format-version is
independent of table spec. Starts with 1 and is incremented when there is a
breaking change to view metadata. |
+| Required | object-type | Type of object this metadata file is for:
"table" or "view". It must be set to "view" for all objects covered in this
spec. |
+| Required | location | Location of the view metadata files |
+| Required | current-version-id | Current version of the view. Set to ‘1’ when
the view is first created. |
+| Optional | properties | A string to string map of view properties. Contains
pre-set properties such as ‘comment’ describing the view, does not contain
arbitrary metadata. |
+| Required | versions | An array of structs describing the last few versions
of the view. Controlled by the table property: “version.history.num_entries”.
See more below. |
+| Required | version-log | An array of structs describing the log of created
versions. See more below. |
+| Optional | schemas | A list of schemas, the same as the ‘schemas’ field from
Iceberg table spec. |
+| Optional | current-schema-id | ID of the current schema of the view |
+
+
+‘Versions’ is an array of structs with fields as shown below:
+
+| Required/Optional | Field Name | Description |
+|-------------------|------------|-------------|
+| Required | version-id | Monotonically increasing id indicating the version
of the view. Starts with “1”. |
+| Required | timestamp-ms | Timestamp expressed in ms since epoch at which the
version of the view was created. |
+| Required | summary | A string to string map of view properties to track
version metadata. This field can be used by engines to store any necessary
properties. Two currently required properties are described below. |
+| Required | representations | A list of "representations" as described below.
|
+
+
+Note that each version is stored in a separate AVRO file. This is to ensure
that the metadata file stays readable in the case the view definition is huge.
As a future extension, an engine-agnostic intermediate representation or a
serialized abstract syntax tree of the SQL definition may also be stored in
each version, exacerbating the problem.
+
+“summary” is a string-string map with the following required keys. Engines may
store additional key-value pairs in this map.
+
+| Required/Optional | Key | Value |
+|-------------------|-----|-------|
+| Required | operation |A string value indicating the view operation that
caused this metadata to be created. Allowed values are “CREATE” and “REPLACE” |
+| Optional | engine-version | A string value indicating the version of the
engine that performed the operation (create / replace) |
+
+“representations” is a list of structs with fields as shown below:
+
+| Required/Optional | Field Name | Description |
+|-------------------|------------|-------------|
+| Required | type | A string indicating the type of representation. The only
valid choice is "sql". The rest of the fields are interpreted by the type of
representation. |
+
+Here are the fields for "sql" representation type:
+
+| Required/Optional | Field Name | Description |
+|-------------------|------------|-------------|
+| Required | type | It must be set to "sql" |
+| Required | sql | A string representing SQL definition of the view as input |
+| Required | dialect | A string specifying the dialect of the ‘sql’ field
above. Used by engines to perform necessary translations to the SQL dialect
supported by the engine. |
+| Optional | session-catalog | A string that specifies the catalog of the user
session when the view was created / replaced. Used to resolve the tables in the
view definition. |
+| Optional | session-namespace | An array of strings indicating namespace at
the time view was created / replaced. Used similar to ‘session-catalog’ above. |
+| Optional | field-aliases | A list of strings of field aliases E.g. a list of
alias_name info specified in the following create view statement. `CREATE VIEW
v (alias_name COMMENT 'docs', alias_name2, ...) AS SELECT ...` |
+| Optional | field-docs | A list of strings of field comments E.g. a list of
‘comment’ info specified in the following create view statement. `CREATE VIEW v
(alias_name COMMENT 'docs', alias_name2, ...) AS SELECT ...` |
Review comment:
I would put the SQL example in a note and not in both descriptions. Are
there requirements about how long this list of aliases is? What happens if this
differs from the current schema?
##########
File path: site/docs/view-spec.md
##########
@@ -0,0 +1,256 @@
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements. See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License. You may obtain a copy of the License at
+ -
+ - http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Iceberg View Spec
+
+## Background and Motivation
+
+Most compute engines (e.g. Trino and Apache Spark) support logical views,
commonly known as ‘views’. A view is a logical table that can be referenced by
future queries. Views do not contain any data. Instead, the query stored by the
view is executed every time the view is referenced by another query. Views and
tables occupy the same namespace.
+Each compute engine stores the metadata of the view in its proprietary format
in the metastore of choice. Thus, views created from one engine can not be read
or altered easily from another engine even when engines share the metastore as
well as the storage system. This document standardizes the view metadata for
ease of sharing the views across engines.
+
+## Goals
+
+* A common metadata format for view metadata, similar to how Iceberg supports
a common table format for tables.
+* The view metadata format specification
+ * Includes storage format as well as APIs to write/read the metadata.
+ * Supports versioning of views to track how a view evolved over time.
+
+## Overview
+
+The view metadata storage and retrieval mirrors how Iceberg table metadata is
stored and retrieved. The view metadata is stored in a JSON file on object
storage for ease of tracking the evolution of the view. Metastore continues to
hold the view object with some properties such as database name, owner, create
time, last access time and an indication that the object is a view.
+
+Each ‘CREATE OR REPLACE VIEW’ statement creates a new view version metadata
file for that view.
+Each metadata file is self-sufficient. It contains the history of the last few
operations performed on the view and can be used to roll back the view to a
previous version.
+
+### Metadata Location
+
+The view metadata location is managed exactly like table metadata location.
+
+### Operations
+
+* Create a view
+* Drop the view
+* Load a view to read the metadata
+* Replace the view
+* Change the view definition
+* Add/delete/edit column comments
+
+## Specification
+
+### Terms
+
+* **Schema** -- Names and types of fields in a view.
+* **Version** -- The state of a view at some point in time.
+
+### View Metadata
+
+The view version metadata file has the following fields:
+
+| Required/Optional | Field Name | Description |
+|-------------------|------------|-------------|
+| Required | format-version | Json format version number for the view metadata
spec. The view metadata spec and the corresponding format-version is
independent of table spec. Starts with 1 and is incremented when there is a
breaking change to view metadata. |
+| Required | object-type | Type of object this metadata file is for:
"table" or "view". It must be set to "view" for all objects covered in this
spec. |
+| Required | location | Location of the view metadata files |
+| Required | current-version-id | Current version of the view. Set to ‘1’ when
the view is first created. |
+| Optional | properties | A string to string map of view properties. Contains
pre-set properties such as ‘comment’ describing the view, does not contain
arbitrary metadata. |
+| Required | versions | An array of structs describing the last few versions
of the view. Controlled by the table property: “version.history.num_entries”.
See more below. |
+| Required | version-log | An array of structs describing the log of created
versions. See more below. |
+| Optional | schemas | A list of schemas, the same as the ‘schemas’ field from
Iceberg table spec. |
+| Optional | current-schema-id | ID of the current schema of the view |
Review comment:
Should this be stored with each version rather than here? I think this
is required to match the current schema version, right?
##########
File path: site/docs/view-spec.md
##########
@@ -0,0 +1,256 @@
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements. See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License. You may obtain a copy of the License at
+ -
+ - http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Iceberg View Spec
+
+## Background and Motivation
+
+Most compute engines (e.g. Trino and Apache Spark) support logical views,
commonly known as ‘views’. A view is a logical table that can be referenced by
future queries. Views do not contain any data. Instead, the query stored by the
view is executed every time the view is referenced by another query. Views and
tables occupy the same namespace.
+Each compute engine stores the metadata of the view in its proprietary format
in the metastore of choice. Thus, views created from one engine can not be read
or altered easily from another engine even when engines share the metastore as
well as the storage system. This document standardizes the view metadata for
ease of sharing the views across engines.
+
+## Goals
+
+* A common metadata format for view metadata, similar to how Iceberg supports
a common table format for tables.
+* The view metadata format specification
+ * Includes storage format as well as APIs to write/read the metadata.
+ * Supports versioning of views to track how a view evolved over time.
+
+## Overview
+
+The view metadata storage and retrieval mirrors how Iceberg table metadata is
stored and retrieved. The view metadata is stored in a JSON file on object
storage for ease of tracking the evolution of the view. Metastore continues to
hold the view object with some properties such as database name, owner, create
time, last access time and an indication that the object is a view.
+
+Each ‘CREATE OR REPLACE VIEW’ statement creates a new view version metadata
file for that view.
+Each metadata file is self-sufficient. It contains the history of the last few
operations performed on the view and can be used to roll back the view to a
previous version.
+
+### Metadata Location
+
+The view metadata location is managed exactly like table metadata location.
+
+### Operations
+
+* Create a view
+* Drop the view
+* Load a view to read the metadata
+* Replace the view
+* Change the view definition
+* Add/delete/edit column comments
+
+## Specification
+
+### Terms
+
+* **Schema** -- Names and types of fields in a view.
+* **Version** -- The state of a view at some point in time.
+
+### View Metadata
+
+The view version metadata file has the following fields:
+
+| Required/Optional | Field Name | Description |
+|-------------------|------------|-------------|
+| Required | format-version | Json format version number for the view metadata
spec. The view metadata spec and the corresponding format-version is
independent of table spec. Starts with 1 and is incremented when there is a
breaking change to view metadata. |
+| Required | object-type | Type of object this metadata file is for:
"table" or "view". It must be set to "view" for all objects covered in this
spec. |
+| Required | location | Location of the view metadata files |
+| Required | current-version-id | Current version of the view. Set to ‘1’ when
the view is first created. |
+| Optional | properties | A string to string map of view properties. Contains
pre-set properties such as ‘comment’ describing the view, does not contain
arbitrary metadata. |
+| Required | versions | An array of structs describing the last few versions
of the view. Controlled by the table property: “version.history.num_entries”.
See more below. |
+| Required | version-log | An array of structs describing the log of created
versions. See more below. |
+| Optional | schemas | A list of schemas, the same as the ‘schemas’ field from
Iceberg table spec. |
+| Optional | current-schema-id | ID of the current schema of the view |
Review comment:
Should this be stored with each version rather than here? I think this
is required to match the current version's schema, right?
##########
File path: site/docs/view-spec.md
##########
@@ -0,0 +1,256 @@
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements. See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License. You may obtain a copy of the License at
+ -
+ - http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Iceberg View Spec
+
+## Background and Motivation
+
+Most compute engines (e.g. Trino and Apache Spark) support logical views,
commonly known as ‘views’. A view is a logical table that can be referenced by
future queries. Views do not contain any data. Instead, the query stored by the
view is executed every time the view is referenced by another query. Views and
tables occupy the same namespace.
+Each compute engine stores the metadata of the view in its proprietary format
in the metastore of choice. Thus, views created from one engine can not be read
or altered easily from another engine even when engines share the metastore as
well as the storage system. This document standardizes the view metadata for
ease of sharing the views across engines.
+
+## Goals
+
+* A common metadata format for view metadata, similar to how Iceberg supports
a common table format for tables.
+* The view metadata format specification
+ * Includes storage format as well as APIs to write/read the metadata.
+ * Supports versioning of views to track how a view evolved over time.
+
+## Overview
+
+The view metadata storage and retrieval mirrors how Iceberg table metadata is
stored and retrieved. The view metadata is stored in a JSON file on object
storage for ease of tracking the evolution of the view. Metastore continues to
hold the view object with some properties such as database name, owner, create
time, last access time and an indication that the object is a view.
+
+Each ‘CREATE OR REPLACE VIEW’ statement creates a new view version metadata
file for that view.
+Each metadata file is self-sufficient. It contains the history of the last few
operations performed on the view and can be used to roll back the view to a
previous version.
+
+### Metadata Location
+
+The view metadata location is managed exactly like table metadata location.
+
+### Operations
+
+* Create a view
+* Drop the view
+* Load a view to read the metadata
+* Replace the view
+* Change the view definition
+* Add/delete/edit column comments
+
+## Specification
+
+### Terms
+
+* **Schema** -- Names and types of fields in a view.
+* **Version** -- The state of a view at some point in time.
+
+### View Metadata
+
+The view version metadata file has the following fields:
+
+| Required/Optional | Field Name | Description |
+|-------------------|------------|-------------|
+| Required | format-version | Json format version number for the view metadata
spec. The view metadata spec and the corresponding format-version is
independent of table spec. Starts with 1 and is incremented when there is a
breaking change to view metadata. |
+| Required | object-type | Type of object this metadata file is for:
"table" or "view". It must be set to "view" for all objects covered in this
spec. |
+| Required | location | Location of the view metadata files |
+| Required | current-version-id | Current version of the view. Set to ‘1’ when
the view is first created. |
+| Optional | properties | A string to string map of view properties. Contains
pre-set properties such as ‘comment’ describing the view, does not contain
arbitrary metadata. |
+| Required | versions | An array of structs describing the last few versions
of the view. Controlled by the table property: “version.history.num_entries”.
See more below. |
+| Required | version-log | An array of structs describing the log of created
versions. See more below. |
+| Optional | schemas | A list of schemas, the same as the ‘schemas’ field from
Iceberg table spec. |
+| Optional | current-schema-id | ID of the current schema of the view |
+
+
+‘Versions’ is an array of structs with fields as shown below:
+
+| Required/Optional | Field Name | Description |
+|-------------------|------------|-------------|
+| Required | version-id | Monotonically increasing id indicating the version
of the view. Starts with “1”. |
+| Required | timestamp-ms | Timestamp expressed in ms since epoch at which the
version of the view was created. |
+| Required | summary | A string to string map of view properties to track
version metadata. This field can be used by engines to store any necessary
properties. Two currently required properties are described below. |
+| Required | representations | A list of "representations" as described below.
|
+
+
+Note that each version is stored in a separate AVRO file. This is to ensure
that the metadata file stays readable in the case the view definition is huge.
As a future extension, an engine-agnostic intermediate representation or a
serialized abstract syntax tree of the SQL definition may also be stored in
each version, exacerbating the problem.
+
+“summary” is a string-string map with the following required keys. Engines may
store additional key-value pairs in this map.
+
+| Required/Optional | Key | Value |
+|-------------------|-----|-------|
+| Required | operation |A string value indicating the view operation that
caused this metadata to be created. Allowed values are “CREATE” and “REPLACE” |
+| Optional | engine-version | A string value indicating the version of the
engine that performed the operation (create / replace) |
+
+“representations” is a list of structs with fields as shown below:
+
+| Required/Optional | Field Name | Description |
+|-------------------|------------|-------------|
+| Required | type | A string indicating the type of representation. The only
valid choice is "sql". The rest of the fields are interpreted by the type of
representation. |
+
+Here are the fields for "sql" representation type:
+
+| Required/Optional | Field Name | Description |
+|-------------------|------------|-------------|
+| Required | type | It must be set to "sql" |
+| Required | sql | A string representing SQL definition of the view as input |
+| Required | dialect | A string specifying the dialect of the ‘sql’ field
above. Used by engines to perform necessary translations to the SQL dialect
supported by the engine. |
+| Optional | session-catalog | A string that specifies the catalog of the user
session when the view was created / replaced. Used to resolve the tables in the
view definition. |
+| Optional | session-namespace | An array of strings indicating namespace at
the time view was created / replaced. Used similar to ‘session-catalog’ above. |
+| Optional | field-aliases | A list of strings of field aliases E.g. a list of
alias_name info specified in the following create view statement. `CREATE VIEW
v (alias_name COMMENT 'docs', alias_name2, ...) AS SELECT ...` |
+| Optional | field-docs | A list of strings of field comments E.g. a list of
‘comment’ info specified in the following create view statement. `CREATE VIEW v
(alias_name COMMENT 'docs', alias_name2, ...) AS SELECT ...` |
+
+“version-log” is an array of structs describing the log of the versions
created. The struct has the following fields:
Review comment:
This describes when each version was considered "current". Creation is
different and is stored in each version's metadata. This allows you to
reconstruct what someone would have seen at some point in time. If the view has
been updated and rolled back, this will show it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]