[GitHub] [flink-web] morsapaes commented on a change in pull request #361: Catalogs blogpost

GitBox Wed, 22 Jul 2020 00:38:24 -0700


morsapaes commented on a change in pull request #361:
URL: https://github.com/apache/flink-web/pull/361#discussion_r458595186




##########
File path: _posts/2020-07-21-catalogs.md
##########
@@ -35,7 +35,12 @@ Catalogs don’t have to be limited to the metadata of 
datasets. You can usually
 * **Queries** - Those can be useful when you don’t want to persist a data set, 
but want to provide a recipe for creating it from other sources instead.
 
 ## Catalogs support in Flink SQL
-Starting from version 1.9, Flink has a set of Catalog APIs that allows to 
integrate Flink with various catalog implementations. With the help of those 
APIs, you can query tables in Flink that were created in your external catalogs 
(e.g. Hive Metastore). Additionally, depending on the catalog implementation, 
you can create new objects such as tables or views from Flink, reuse them 
across different jobs, and possibly even use them in other tools compatible 
with that catalog. As of Flink 1.11, there are two catalog implementations 
supported by the community:
+Starting from version 1.9, Flink has a set of Catalog APIs that allows to 
integrate Flink with various catalog implementations. With the help of those 
APIs, you can query tables in Flink that were created in your external catalogs 
(e.g. Hive Metastore). Additionally, depending on the catalog implementation, 
you can create new objects such as tables or views from Flink, reuse them 
across different jobs, and possibly even use them in other tools compatible 
with that catalog. In other words you can see catalogs with two-fold purpose:
+
+  * Catalogs are sort of out-of-the box integration with an ecosystem such as 
RDBMs or Hive, where you can query the external, towards Flink, tables, views, 
or functions without additional connector configuration. The connector 
properties are automatically derived from the Catalog itself.
+  * A persistent store for Flink specific metadata. In this mode we 
additionally store connector properties alongside the logical metadata such as 
a schema or a name. That approach let's you store a full definition of e.g. a 
Kafka backed table with records serialized with Avro in Hive that can be later 
on used by Flink. However, as it incorporates Flink specific properties it can 
not be used by other tools that leverage Hive metastore. 

Review comment:
       ```suggestion
   Starting from version 1.9, Flink has a set of Catalog APIs that allows to 
integrate Flink with various catalog implementations. With the help of those 
APIs, you can query tables in Flink that were created in your external catalogs 
(e.g. Hive Metastore). Additionally, depending on the catalog implementation, 
you can create new objects such as tables or views from Flink, reuse them 
across different jobs, and possibly even use them in other tools compatible 
with that catalog. In other words, you can see catalogs as having a two-fold 
purpose:
   
     * Provide an out-of-the box integration with ecosystems such as RDBMSs or 
Hive that allows you to query external objects like tables, views, or functions 
with no additional connector configuration. The connector properties are 
automatically derived from the catalog itself.
     
     * Act as a persistent store for Flink-specific metadata. In this mode, we 
additionally store connector properties alongside the logical metadata (e.g. 
schema, object name). That approach enables you to, for example, store a full 
definition of a Kafka-backed table with records serialized with Avro in Hive 
that can be later on used by Flink. However, as it incorporates Flink-specific 
properties, it can not be used by other tools that leverage Hive Metastore. 
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink-web] morsapaes commented on a change in pull request #361: Catalogs blogpost

Reply via email to