WeiZhong94 commented on a change in pull request #13273: URL: https://github.com/apache/flink/pull/13273#discussion_r479147575
########## File path: docs/dev/python/user-guide/table/10_minutes_to_table_api.md ########## @@ -0,0 +1,712 @@ +--- +title: "10 Minutes to Table API" +nav-parent_id: python_tableapi +nav-pos: 25 +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +This document is a short introduction to PyFlink Table API, which is used to help novice users quickly understand the basic usage of PyFlink Table API. +For advanced usage, please refer to other documents in this User Guide. + +* This will be replaced by the TOC +{:toc} + +Common Structure of Python Table API Program +-------------------------------------------- + +All Table API and SQL programs for batch and streaming follow the same pattern. The following code example shows the common structure of Table API and SQL programs. + +{% highlight python %} + +from pyflink.table import EnvironmentSettings, StreamTableEnvironment + +# 1. create a TableEnvironment +table_env = StreamTableEnvironment.create(environment_settings=EnvironmentSettings.new_instance().in_streaming_mode().use_blink_planner().build()) + +# 2. create source Table +table_env.execute_sql(""" +CREATE TABLE datagen ( + id INT, + data STRING +) WITH ( + 'connector' = 'datagen', + 'fields.id.kind' = 'sequence', + 'fields.id.start' = '1', + 'fields.id.end' = '10' +) +""") + +# 3. create sink Table +table_env.execute_sql(""" +CREATE TABLE print ( + id INT, + data STRING +) WITH ( + 'connector' = 'print' +) +""") + +# 4. query from source table and caculate +# create a Table from a Table API query: +tapi_result = table_env.from_path("datagen").select("id + 1, data") +# or create a Table from a SQL query: +sql_result = table_env.sql_query("SELECT * FROM datagen").select("id + 1, data") + +# 5. emit query result to sink table +# emit a Table API result Table to a sink table: +tapi_result.execute_insert("print").get_job_client().get_job_execution_result().result() +sql_result.execute_insert("print").get_job_client().get_job_execution_result().result() +# or emit results via SQL query: +table_env.execute_sql("INSERT INTO print SELECT * FROM datagen").get_job_client().get_job_execution_result().result() + +{% endhighlight %} + +{% top %} + +Create a TableEnvironment +------------------------- + +The `TableEnvironment` is a central concept of the Table API and SQL integration. The following code example shows how to create a TableEnvironment: + +{% highlight python %} + +from pyflink.table import EnvironmentSettings, StreamTableEnvironment, BatchTableEnvironment + +# create a blink streaming TableEnvironment +table_env = StreamTableEnvironment.create(environment_settings=EnvironmentSettings.new_instance().in_streaming_mode().use_blink_planner().build()) + +# create a blink batch TableEnvironment +table_env = BatchTableEnvironment.create(environment_settings=EnvironmentSettings.new_instance().in_batch_mode().use_blink_planner().build()) + +# create a flink streaming TableEnvironment +table_env = StreamTableEnvironment.create(environment_settings=EnvironmentSettings.new_instance().in_streaming_mode().use_old_planner().build()) + +# create a flink batch TableEnvironment +table_env = BatchTableEnvironment.create(environment_settings=EnvironmentSettings.new_instance().in_batch_mode().use_old_planner().build()) + +{% endhighlight %} + +The `TableEnvironment` is responsible for: + +* Creating `Table`s +* Registering `Table`s to the catalog +* Executing SQL queries +* Registering user-defined (scalar, table, or aggregation) functions +* Offering further configuration options. +* Add Python dependencies to support running Python UDF on remote cluster +* Executing jobs. + +Currently there are 2 planners available: flink planner and blink planner. + +You should explicitly set which planner to use in the current program. +We recommend using the blink planner as much as possible. +The blink planner is more powerful in functionality and performance, and the flink planner is reserved for compatibility. Review comment: I'll remove this line. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
