[GitHub] [flink] sjwiesman commented on a change in pull request #10078: [FLINK-14486][table-api, docs] Update documentation regarding Temporary Objects

GitBox Mon, 04 Nov 2019 11:44:55 -0800

sjwiesman commented on a change in pull request #10078: 
[FLINK-14486][table-api, docs] Update documentation regarding Temporary Objects
URL: https://github.com/apache/flink/pull/10078#discussion_r342235885


 ##########
 File path: docs/dev/table/common.md
 ##########
 @@ -292,25 +292,39 @@ b_b_t_env = 
BatchTableEnvironment.create(environment_settings=b_b_settings)
 
 **Note:** If there is only one planner jar in `/lib` directory, you can use 
`useAnyPlanner` (`use_any_planner` for python) to create specific 
`EnvironmentSettings`.
 
-
 {% top %}
 
-Register Tables in the Catalog
+Create Tables in the Catalog
 -------------------------------
 
-A `TableEnvironment` maintains a catalog of tables which are registered by 
name. There are two types of tables, *input tables* and *output tables*. Input 
tables can be referenced in Table API and SQL queries and provide input data. 
Output tables can be used to emit the result of a Table API or SQL query to an 
external system.
+A `TableEnvironment` maintains a map of catalogs of tables which are created 
with an identifier. Each
+identifier consists of 3 parts: catalog name, database name and object name. 
If a catalog or database was not
+specified current default value will be used (see examples in the [Table 
identifier expanding](#table-identifier-expanding) section).
+
+Tables can be either virtual (`VIEWS`) or regular(`TABLES`). `VIEWS` can be 
created e.g. from an an
+existing `Table` object, usually the result of a Table API or SQL query. 
`TABLES` describe an
+external data, such as a file, database, or messaging system.
+
+### Temporary vs permanent tables.
 
-An input table can be registered from various sources:
+Permanent tables are tables which meta information is stored in a [catalog]({{ 
site.baseurl }}/dev/table/catalogs.html). Usually
+it means it is persisted in an external metastore such as e.g. Hive. This 
implies that there must exist a serializable representation
+of such table. They will be available across multiple sessions/jobs as long as 
the connection to the metastore is maintained. 
 
-* an existing `Table` object, usually the result of a Table API or SQL query.
-* a `TableSource`, which accesses external data, such as a file, database, or 
messaging system. 
-* a `DataStream` or `DataSet` from a DataStream (only for stream job) or 
DataSet (only for batch job translated from old planner) program. Registering a 
`DataStream` or `DataSet` is discussed in the [Integration with DataStream and 
DataSet API](#integration-with-datastream-and-dataset-api) section.
+On the other hand temporary tables are always stored in memory and exist only 
for a single session/job.
+Temporary tables are not bound to any catalog or database. They can be created 
in a namespace of a catalog and database
+that does not exist. They are also not dropped if a corresponding database is 
removed.
+Moreover temporary tables can shadow permanent tables. It is possible to 
register a temporary table
+with exactly the same identifier as an existing permanent table. Such a 
permanent table becomes
+inaccessible as long as the temporary table exists. All queries with that 
identifier will be executed
+against the temporary table.
 
 Review comment:
   I realize the sentence about permanent tables needing to be serializable is 
likely referring to registered DataStream / DataSet tables. The issue is it 
feels like explaining why without saying what. I think it makes more sense to 
leave that out of this section. Instead, in the section on going from 
DataStream / Set -> Table just say the resulting table is always temporary. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] sjwiesman commented on a change in pull request #10078: [FLINK-14486][table-api, docs] Update documentation regarding Temporary Objects

Reply via email to