sjwiesman commented on a change in pull request #10078: [FLINK-14486][table-api, docs] Update documentation regarding Temporary Objects URL: https://github.com/apache/flink/pull/10078#discussion_r342235885
########## File path: docs/dev/table/common.md ########## @@ -292,25 +292,39 @@ b_b_t_env = BatchTableEnvironment.create(environment_settings=b_b_settings) **Note:** If there is only one planner jar in `/lib` directory, you can use `useAnyPlanner` (`use_any_planner` for python) to create specific `EnvironmentSettings`. - {% top %} -Register Tables in the Catalog +Create Tables in the Catalog ------------------------------- -A `TableEnvironment` maintains a catalog of tables which are registered by name. There are two types of tables, *input tables* and *output tables*. Input tables can be referenced in Table API and SQL queries and provide input data. Output tables can be used to emit the result of a Table API or SQL query to an external system. +A `TableEnvironment` maintains a map of catalogs of tables which are created with an identifier. Each +identifier consists of 3 parts: catalog name, database name and object name. If a catalog or database was not +specified current default value will be used (see examples in the [Table identifier expanding](#table-identifier-expanding) section). + +Tables can be either virtual (`VIEWS`) or regular(`TABLES`). `VIEWS` can be created e.g. from an an +existing `Table` object, usually the result of a Table API or SQL query. `TABLES` describe an +external data, such as a file, database, or messaging system. + +### Temporary vs permanent tables. -An input table can be registered from various sources: +Permanent tables are tables which meta information is stored in a [catalog]({{ site.baseurl }}/dev/table/catalogs.html). Usually +it means it is persisted in an external metastore such as e.g. Hive. This implies that there must exist a serializable representation +of such table. They will be available across multiple sessions/jobs as long as the connection to the metastore is maintained. -* an existing `Table` object, usually the result of a Table API or SQL query. -* a `TableSource`, which accesses external data, such as a file, database, or messaging system. -* a `DataStream` or `DataSet` from a DataStream (only for stream job) or DataSet (only for batch job translated from old planner) program. Registering a `DataStream` or `DataSet` is discussed in the [Integration with DataStream and DataSet API](#integration-with-datastream-and-dataset-api) section. +On the other hand temporary tables are always stored in memory and exist only for a single session/job. +Temporary tables are not bound to any catalog or database. They can be created in a namespace of a catalog and database +that does not exist. They are also not dropped if a corresponding database is removed. +Moreover temporary tables can shadow permanent tables. It is possible to register a temporary table +with exactly the same identifier as an existing permanent table. Such a permanent table becomes +inaccessible as long as the temporary table exists. All queries with that identifier will be executed +against the temporary table. Review comment: I realize the sentence about permanent tables needing to be serializable is likely referring to registered DataStream / DataSet tables. The issue is it feels like explaining why without saying what. I think it makes more sense to leave that out of this section. Instead, in the section on going from DataStream / Set -> Table just say the resulting table is always temporary. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services