[GitHub] [iceberg] rdblue commented on a diff in pull request #4706: Python: Support iceberg base catalog in python library (#3245)

GitBox Sat, 14 May 2022 11:14:45 -0700


rdblue commented on code in PR #4706:
URL: https://github.com/apache/iceberg/pull/4706#discussion_r873059425



##########
python/src/iceberg/catalog/base.py:
##########
@@ -0,0 +1,223 @@
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing,
+#  software distributed under the License is distributed on an
+#  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+#  KIND, either express or implied.  See the License for the
+#  specific language governing permissions and limitations
+#  under the License.
+
+from abc import ABC, abstractmethod
+from typing import Dict, List, Optional, Tuple
+
+from iceberg.schema import Schema
+from iceberg.table.base import PartitionSpec, Table
+
+
+class Catalog(ABC):
+    """Base Catalog for table operations like - create, drop, load, list and 
others.
+
+    Attributes:
+        name(str): Name of the catalog
+        properties(Dict[str, str]): Catalog properties
+    """
+
+    def __init__(self, name: str, properties: Dict[str, str]):
+        self._name = name
+        self._properties = properties
+
+    @property
+    def name(self) -> str:
+        return self._name
+
+    @property
+    def properties(self) -> Dict[str, str]:
+        return self._properties
+
+    @abstractmethod
+    def create_table(
+        self,
+        *,
+        namespace: Tuple[str, ...],
+        name: str,

Review Comment:
   @dhruv-pratap, I agree that this should support strings for table names, 
like `catalog.load_table("examples.nyc_taxi_yellow")`.
   
   > do you mean using just a single fully classified table name 'str'? Or, a 
'str' namespace and 'str' name?
   
   I don't think it makes sense to force the caller to parse if we are going to 
split on `.`. If Iceberg can handle splitting the namespace then it can also 
split the table name out. That also aligns with how people will naturally use 
this, like Spark's `writeTo("db.table")` or 
`spark.read("examples.nyc_taxi_yellow")`. That means we support a string table 
identifier.
   
   We also know that we need to support `Tuple[str]`, for cases like SQL 
engines where `.` can be included in an identifier by escaping and quoting.
   
   Supporting both `str` and `Tuple[str]` means that we should not have a 
separate `namespace` argument. Otherwise, there are ambiguous (or not obvious) 
cases, like `catalog.load_table(namespace=("a", "b"), name="c.d")`.
   
   Bringing all that together, the catalog methods should accept `identifier: 
str | Tuple[str]`. If the identifier is a string, it is split into a tuple on 
`.`. If it is a tuple, it is used as-is. That's simple and understandable, 
while supporting all of the cases that we need.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rdblue commented on a diff in pull request #4706: Python: Support iceberg base catalog in python library (#3245)

Reply via email to