ashvina commented on code in PR #605:
URL: https://github.com/apache/incubator-xtable/pull/605#discussion_r1898160059


##########
rfc/rfc-1/rfc-1.md:
##########
@@ -0,0 +1,139 @@
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+# RFC-[1]: XCatalogSync - Synchronize tables across catalogs
+
+## Proposers
+
+- @vinishjail97
+
+## Approvers
+
+- Anyone from XTable community can approve/add feedback.
+
+## Status
+
+GH Feature Request: https://github.com/apache/incubator-xtable/issues/590
+
+> Please keep the status updated in `rfc/README.md`.
+
+## Abstract
+
+Users of Apache XTable (Incubating) today can translate metadata across table 
formats (iceberg, hudi, and delta) and use the tables in different platforms 
depending on their choice. 
+Today there's still some friction involved in terms of usability because users 
need to explicitly [register](https://xtable.apache.org/docs/catalogs-index) 
the tables in the catalog of their choice (glue, HMS, unity, bigLake etc.) 

Review Comment:
   Hi @vinishjail97, thanks for sharing the details. I think catalog sync is a 
useful feature. One of the key value adds of catalogs is governance, 
particularly access control. All the catalogs mentioned here provide the 
ability to grant different privileges to roles. The proposed catalog sync in 
XTable replicates the table across catalogs. What are your thoughts about 
porting the governance features?



##########
rfc/rfc-1/rfc-1.md:
##########
@@ -0,0 +1,139 @@
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+# RFC-[1]: XCatalogSync - Synchronize tables across catalogs
+
+## Proposers
+
+- @vinishjail97
+
+## Approvers
+
+- Anyone from XTable community can approve/add feedback.
+
+## Status
+
+GH Feature Request: https://github.com/apache/incubator-xtable/issues/590
+
+> Please keep the status updated in `rfc/README.md`.
+
+## Abstract
+
+Users of Apache XTable (Incubating) today can translate metadata across table 
formats (iceberg, hudi, and delta) and use the tables in different platforms 
depending on their choice. 
+Today there's still some friction involved in terms of usability because users 
need to explicitly [register](https://xtable.apache.org/docs/catalogs-index) 
the tables in the catalog of their choice (glue, HMS, unity, bigLake etc.) 
+and then use the catalog in the platform of their choice to do DDL, DML 
queries.
+
+## Background
+XTable is built on the principle of omnidirectional interoperability, and I'm 
proposing an interface which allows syncing metadata of table formats to 
multiple catalogs in a continuous and incremental manner. With this new 
functionality we will be able to      
+1. Reduce friction for XTable users - XTable sync will register the tables in 
the catalogs of their choice after metadata generation. If users are using a 
single format, they can still use XTable to sync the metadata across multiple 
catalogs.
+2. Avoid catalog lock-in - There's no reason why data/metadata in storage 
should be registered in a single catalog, users can register the table across 
multiple catalogs depending on the use-case, ecosystem and features provided by 
the catalog.
+
+## Implementation
+
+Introducing the following interfaces. [[PR]]( 
https://github.com/apache/incubator-xtable/pull/603)
+1. `CatalogSyncClient`: This interface contains methods that are responsible 
for creating table, refreshing table metadata, dropping table etc. in target 
catalog. Consider this interface as a translation layer between InternalTable 
and the catalog's table object. 

Review Comment:
   I don't think table DDL operations are related to InternalTable. But you 
bring up an important point. The table format and the catalog layer are two 
different layers in the analytics stack. Currently, XTable only supports 
conversion of table format level metadata, which is captured by current 
InternalTable. However, the proposed feature extends to the catalog layer where 
catalog level metadata translation takes place. So, in effect, this feature 
syncs InternalCatalog object. What are your thoughts?
   



##########
rfc/rfc-1/rfc-1.md:
##########
@@ -0,0 +1,139 @@
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+# RFC-[1]: XCatalogSync - Synchronize tables across catalogs
+
+## Proposers
+
+- @vinishjail97
+
+## Approvers
+
+- Anyone from XTable community can approve/add feedback.
+
+## Status
+
+GH Feature Request: https://github.com/apache/incubator-xtable/issues/590
+
+> Please keep the status updated in `rfc/README.md`.
+
+## Abstract
+
+Users of Apache XTable (Incubating) today can translate metadata across table 
formats (iceberg, hudi, and delta) and use the tables in different platforms 
depending on their choice. 
+Today there's still some friction involved in terms of usability because users 
need to explicitly [register](https://xtable.apache.org/docs/catalogs-index) 
the tables in the catalog of their choice (glue, HMS, unity, bigLake etc.) 
+and then use the catalog in the platform of their choice to do DDL, DML 
queries.
+
+## Background
+XTable is built on the principle of omnidirectional interoperability, and I'm 
proposing an interface which allows syncing metadata of table formats to 
multiple catalogs in a continuous and incremental manner. With this new 
functionality we will be able to      
+1. Reduce friction for XTable users - XTable sync will register the tables in 
the catalogs of their choice after metadata generation. If users are using a 
single format, they can still use XTable to sync the metadata across multiple 
catalogs.
+2. Avoid catalog lock-in - There's no reason why data/metadata in storage 
should be registered in a single catalog, users can register the table across 
multiple catalogs depending on the use-case, ecosystem and features provided by 
the catalog.
+
+## Implementation
+
+Introducing the following interfaces. [[PR]]( 
https://github.com/apache/incubator-xtable/pull/603)
+1. `CatalogSyncClient`: This interface contains methods that are responsible 
for creating table, refreshing table metadata, dropping table etc. in target 
catalog. Consider this interface as a translation layer between InternalTable 
and the catalog's table object. 
+2. `CatalogSync`: This interface synchronizes the internal XTable object 
(InternalTable) to multiple target catalogs using the methods available in 
`CatalogSyncClient` interface.
+3. `CatalogTableIdentifier`: Represents a catalog table identifier in a 
multi-level catalog system. `HierarchicalTableIdentifier` is an internal 
representation of a fully qualified table identifier within a catalog following 
the three level hierarchy convention (it's used by all the major catalogs glue, 
hms, unity etc.). In the future, we can support other conventions by 
implementing this interface.
+
+For XTable users, defining their source/target catalog configurations and 
synchronizing tables will be handled by the `RunCatalogSync` class. This 
utility class parses the user’s YAML configuration, synchronizes table format 
metadata when necessary, and then uses the previously defined interfaces to 
synchronize the table in the catalog.
+[[PR]]( https://github.com/apache/incubator-xtable/pull/591)
+
+User's YAML configuration.
+1. `sourceCatalog`: Configuration of the source catalog from which XTable will 
read. It must contain all the necessary connection and access details for 
describing and listing tables.
+    1. `catalogId`:  A user-defined unique identifier for the catalog, allows 
user to sync table to multiple catalogs of the same name/type eg: HMS catalog 
with url1, HMS catalog with url2.
+    2. `catalogType`: The type of the source catalog. This might be a specific 
type understood by XTable, such as Hive, Glue etc.
+    3. `catalogSyncClientImpl`(optional): A fully qualified class name that 
implements the interface for `CatalogSyncClient`, it can be used if the 
implementation for catalogType doesn't exist in XTable.
+    4. `catalogConversionSourceImpl`(optional): A fully qualified class name 
that implements the interface for `CatalogConversionSource`, it can be used if 
the implementation for catalogType doesn't exist in XTable.

Review Comment:
   It is not clear what is the difference in the role of 
CatalogConversionSource and CatalogSyncClient. Could you please clarify. In 
case of table sync, only TableSource exists, there is no TableClient.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@xtable.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to