[GitHub] [flink] MartijnVisser commented on a change in pull request #18812: [FLINK-25129][docs] Improvements to the table-planner-loader related docs

GitBox Thu, 17 Feb 2022 08:08:29 -0800


MartijnVisser commented on a change in pull request #18812:
URL: https://github.com/apache/flink/pull/18812#discussion_r809079539




##########
File path: docs/content/docs/dev/configuration/advanced.md
##########
@@ -24,33 +24,28 @@ under the License.
 
 # Advanced Configuration Topics
 
-## Dependencies: Flink Core and User Application
-
-There are two broad categories of dependencies and libraries in Flink, which 
are explained below.
-
-### Flink Core Dependencies
+## Anatomy of the Flink distribution
 
 Flink itself consists of a set of classes and dependencies that form the core 
of Flink's runtime
 and must be present when a Flink application is started. The classes and 
dependencies needed to run
 the system handle areas such as coordination, networking, checkpointing, 
failover, APIs,
 operators (such as windowing), resource management, etc.
 
-These core classes and dependencies are packaged in the `flink-dist` jar, are 
part of Flink's `lib`
-folder, and part of the basic Flink container images. You can think of these 
dependencies as similar
-to Java's core library, which contains classes like `String` and `List`.
+These core classes and dependencies are packaged in the `flink-dist.jar` 
available in the `/lib`

Review comment:
       It seems like one or more words are missing here after mentioning the 
JAR. Not sure which exactly

##########
File path: docs/content/docs/dev/configuration/advanced.md
##########
@@ -24,33 +24,28 @@ under the License.
 
 # Advanced Configuration Topics
 
-## Dependencies: Flink Core and User Application
-
-There are two broad categories of dependencies and libraries in Flink, which 
are explained below.
-
-### Flink Core Dependencies
+## Anatomy of the Flink distribution
 
 Flink itself consists of a set of classes and dependencies that form the core 
of Flink's runtime
 and must be present when a Flink application is started. The classes and 
dependencies needed to run
 the system handle areas such as coordination, networking, checkpointing, 
failover, APIs,
 operators (such as windowing), resource management, etc.
 
-These core classes and dependencies are packaged in the `flink-dist` jar, are 
part of Flink's `lib`
-folder, and part of the basic Flink container images. You can think of these 
dependencies as similar
-to Java's core library, which contains classes like `String` and `List`.
+These core classes and dependencies are packaged in the `flink-dist.jar` 
available in the `/lib`
+folder in the downloaded distribution, and part of the basic Flink container 
images. 
+You can think of these dependencies as similar to Java's core library, which 
contains classes like `String` and `List`.
 
 In order to keep the core dependencies as small as possible and avoid 
dependency clashes, the
 Flink Core Dependencies do not contain any connectors or libraries (i.e. CEP, 
SQL, ML) in order to
 avoid having an excessive default number of classes and dependencies in the 
classpath.
 
-### User Application Dependencies
+The `/lib` directory of the Flink distribution additionally contains various 
JARs including commonly used modules, 
+such as all the required [modules to execute Table 
jobs](#anatomy-of-table-dependencies) and a set of connector and formats.

Review comment:
       I think that if we structure the documentation like this, we need to 
have the overview for both the Table API jobs as well as DataStream API jobs. 

##########
File path: docs/content/docs/dev/configuration/connector.md
##########
@@ -24,39 +24,42 @@ under the License.
 
 # Connectors and Formats
 
-Flink can read from and write to various external systems via connectors and 
define the format in 
-which to store the data.
+Flink can read from and write to various external systems via connectors and 
use the format of your choice
+in order to read/write data from/into records.
 
-The way that information is serialized is represented in the external system 
and that system needs
-to know how to read this data in a format that can be read by Flink.  This is 
done through format 
-dependencies.
+An overview of available connectors and formats is available for both
+[DataStream]({{< ref "docs/connectors/datastream/overview.md" >}}) and
+[Table API/SQL]({{< ref "docs/connectors/table/overview.md" >}}).
 
-Most applications need specific connectors to run. Flink provides a set of 
formats that can be used 
-with connectors (with the dependencies for both being fairly unified). These 
are not part of Flink's 
-core dependencies and must be added as dependencies to the application.
+In order to use connectors and formats, you need to make sure Flink has access 
to the artifacts implementing them. 
+For each connector supported by the Flink community, we publish on [Maven 
Central](https://search.maven.org) two artifacts:
 
-## Adding Dependencies 
+* `flink-connector-<NAME>` which is a thin JAR including only the connector 
code, but excluding eventual 3rd party dependencies

Review comment:
       I'm actually not 100% sure if this is a correct statement, but I'll have 
to check with @fapaul on that

##########
File path: docs/content/docs/dev/configuration/connector.md
##########
@@ -24,39 +24,42 @@ under the License.
 
 # Connectors and Formats
 
-Flink can read from and write to various external systems via connectors and 
define the format in 
-which to store the data.
+Flink can read from and write to various external systems via connectors and 
use the format of your choice
+in order to read/write data from/into records.
 
-The way that information is serialized is represented in the external system 
and that system needs
-to know how to read this data in a format that can be read by Flink.  This is 
done through format 
-dependencies.
+An overview of available connectors and formats is available for both
+[DataStream]({{< ref "docs/connectors/datastream/overview.md" >}}) and
+[Table API/SQL]({{< ref "docs/connectors/table/overview.md" >}}).
 
-Most applications need specific connectors to run. Flink provides a set of 
formats that can be used 
-with connectors (with the dependencies for both being fairly unified). These 
are not part of Flink's 
-core dependencies and must be added as dependencies to the application.
+In order to use connectors and formats, you need to make sure Flink has access 
to the artifacts implementing them. 
+For each connector supported by the Flink community, we publish on [Maven 
Central](https://search.maven.org) two artifacts:
 
-## Adding Dependencies 
+* `flink-connector-<NAME>` which is a thin JAR including only the connector 
code, but excluding eventual 3rd party dependencies
+* `flink-sql-connector-<NAME>` which is an uber JAR ready to use with all the 
connector 3rd party dependencies.
 
-For more information on how to add dependencies, refer to the build tools 
sections on [Maven]({{< ref "docs/dev/configuration/maven" >}})
-and [Gradle]({{< ref "docs/dev/configuration/gradle" >}}). 
+The same applies for formats as well. Also note that some connectors, because 
they don't require 3rd party dependencies,
+may not have a corresponding `flink-sql-connector-<NAME>` artifact.
 
-## Packaging Dependencies
+{{< hint info >}}
+The uber JARs are supported mostly for being used in conjunction with [SQL 
client]({{< ref "docs/dev/table/sqlClient" >}}),
+but you can also use them in any DataStream/Table job.

Review comment:
       ```suggestion
   but you can also use them in any DataStream/Table application.
   ```
   
   Asking @fapaul to confirm that they can indeed be used in DataStream 
applications. 
   
   

##########
File path: docs/content/docs/dev/configuration/overview.md
##########
@@ -177,19 +177,46 @@ bash -c "$(curl 
https://flink.apache.org/q/gradle-quickstart.sh)" -- {{< version
 
 ## Which dependencies do you need?
 
-Depending on what you want to achieve, you are going to choose a combination 
of our available APIs, 
-which will require different dependencies. 
+To start working on a Flink job, you usually need the following dependencies:
+
+* Flink APIs, in order to develop your job
+* [Connectors and formats]({{< ref "docs/dev/configuration/connector" >}}), in 
order to integrate your job with external systems
+* [Testing utilities]({{< ref "docs/dev/configuration/testing" >}}), in order 
to test your job
+
+And in addition to these, you might want to add 3rd party dependencies that 
you need to develop custom functions.
+
+### Flink APIs
+
+Flink offers two major APIs: [Datastream API]({{< ref 
"docs/dev/datastream/overview" >}}) and [Table API & SQL]({{< ref 
"docs/dev/table/overview" >}}). 
+They can be used separately, or they can be mixed, depending on your use cases:
+
+| APIs you want to use                                                         
     | Dependency you need to add                          |
+|-----------------------------------------------------------------------------------|-----------------------------------------------------|
+| [DataStream]({{< ref "docs/dev/datastream/overview" >}})                     
     | `flink-streaming-java`                              |  
+| [DataStream with Scala]({{< ref "docs/dev/datastream/scala_api_extensions" 
>}})   | `flink-streaming-scala{{< scala_version >}}`        |   
+| [Table API]({{< ref "docs/dev/table/common" >}})                             
     | `flink-table-api-java`                              |   
+| [Table API with Scala]({{< ref "docs/dev/table/common" >}})                  
     | `flink-table-api-scala{{< scala_version >}}`        |
+| [Table API + DataStream]({{< ref "docs/dev/table/data_stream_api" >}})       
     | `flink-table-api-java-bridge`                       |
+| [Table API + DataStream with Scala]({{< ref "docs/dev/table/data_stream_api" 
>}}) | `flink-table-api-scala-bridge{{< scala_version >}}` |
+
+Just include them in your build tool script/descriptor, and you can start 
developing your job!
+
+## Running and packaging
+
+If you want to run your job by simply executing the main class, you will need 
`flink-runtime` in your classpath.
+In case of Table API programs, you will also need `flink-table-runtime` and 
`flink-table-planner-loader`.
 
-Here is a table of artifact/dependency names:
+As a rule of thumb, we **suggest** packaging the application code and all its 
required dependencies into one fat/uber JAR.
+This includes packaging connectors, formats and every 3rd party dependencies 
of your job.
+This rule **does not apply** to Java APIs, DataStream Scala APIs and eventual 
aforementioned runtime modules, 
+which are already provided by Flink itself and **must not** be included in a 
job uber JAR.
+This job JAR can be submitted to an already running Flink cluster, or added to 
a Flink application
+container image easily without modifying the distribution.
 
-| APIs you want to use              | Dependency you need to add    |
-|-----------------------------------|-------------------------------|
-| DataStream                        | flink-streaming-java          |  
-| DataStream with Scala             | flink-streaming-scala{{< scala_version 
>}}         |   
-| Table API                         | flink-table-api-java          |   
-| Table API with Scala              | flink-table-api-scala{{< scala_version 
>}}         |
-| Table API + DataStream            | flink-table-api-java-bridge   |
-| Table API + DataStream with Scala | flink-table-api-scala-bridge{{< 
scala_version >}}  |
+## What's next?
 
-Check out the sections on [Datastream API]({{< ref 
"docs/dev/datastream/overview" >}}) and 
-[Table API & SQL]({{< ref "docs/dev/table/overview" >}}) to learn more.
+* To start developing your job, check out [DataStream API]({{< ref 
"docs/dev/datastream/overview" >}}) and [Table API & SQL]({{< ref 
"docs/dev/table/overview" >}}).

Review comment:
       Really like this section 👍 

##########
File path: docs/content/docs/dev/configuration/connector.md
##########
@@ -24,39 +24,42 @@ under the License.
 
 # Connectors and Formats
 
-Flink can read from and write to various external systems via connectors and 
define the format in 
-which to store the data.
+Flink can read from and write to various external systems via connectors and 
use the format of your choice
+in order to read/write data from/into records.
 
-The way that information is serialized is represented in the external system 
and that system needs
-to know how to read this data in a format that can be read by Flink.  This is 
done through format 
-dependencies.
+An overview of available connectors and formats is available for both
+[DataStream]({{< ref "docs/connectors/datastream/overview.md" >}}) and
+[Table API/SQL]({{< ref "docs/connectors/table/overview.md" >}}).
 
-Most applications need specific connectors to run. Flink provides a set of 
formats that can be used 
-with connectors (with the dependencies for both being fairly unified). These 
are not part of Flink's 
-core dependencies and must be added as dependencies to the application.
+In order to use connectors and formats, you need to make sure Flink has access 
to the artifacts implementing them. 
+For each connector supported by the Flink community, we publish on [Maven 
Central](https://search.maven.org) two artifacts:
 
-## Adding Dependencies 
+* `flink-connector-<NAME>` which is a thin JAR including only the connector 
code, but excluding eventual 3rd party dependencies
+* `flink-sql-connector-<NAME>` which is an uber JAR ready to use with all the 
connector 3rd party dependencies.
 
-For more information on how to add dependencies, refer to the build tools 
sections on [Maven]({{< ref "docs/dev/configuration/maven" >}})
-and [Gradle]({{< ref "docs/dev/configuration/gradle" >}}). 
+The same applies for formats as well. Also note that some connectors, because 
they don't require 3rd party dependencies,
+may not have a corresponding `flink-sql-connector-<NAME>` artifact.
 
-## Packaging Dependencies
+{{< hint info >}}
+The uber JARs are supported mostly for being used in conjunction with [SQL 
client]({{< ref "docs/dev/table/sqlClient" >}}),
+but you can also use them in any DataStream/Table job.
+{{< /hint >}}
 
-We recommend packaging the application code and all its required dependencies 
into one fat/uber JAR. 
-This job JAR can be submitted to an already running Flink cluster, or added to 
a Flink application 
-container image.
+In order to use a connector/format module, you can either:

Review comment:
       This also feels like a natural place for another section, like "Using 
artifacts"

##########
File path: docs/content/docs/dev/configuration/advanced.md
##########
@@ -102,20 +97,28 @@ For more details, check out how to [connect to external 
systems]({{< ref "docs/c
 
 Starting from Flink 1.15, the distribution contains two planners:
 
--`flink-table-planner{{< scala_version >}}-{{< version >}}.jar`, in `/opt`, 
contains the query planner
--`flink-table-planner-loader-{{< version >}}.jar`, loaded by default in 
`/lib`, contains the query planner 
+- `flink-table-planner{{< scala_version >}}-{{< version >}}.jar`, in `/opt`, 
contains the query planner
+- `flink-table-planner-loader-{{< version >}}.jar`, loaded by default in 
`/lib`, contains the query planner 
   hidden behind an isolated classpath (you won't be able to address any 
`io.apache.flink.table.planner` directly)
 
-The planners contain the same code, but they are packaged differently. In one 
case, you must use the 
+The two planner JARs contain the same code, but they are packaged differently. 
In one case, you must use the 
 same Scala version of the JAR. In the other, you do not need to make 
considerations about Scala, since
 it is hidden inside the JAR.

Review comment:
       ```suggestion
   The two planner JARs contain the same code, but they are packaged 
differently. In the first case, you must use the 
   same Scala version of the JAR. In second case, you do not need to make 
considerations about Scala, since
   it is hidden inside the JAR.
   ```

##########
File path: docs/content/docs/dev/configuration/advanced.md
##########
@@ -84,7 +79,7 @@ The Flink distribution contains by default the required JARs 
to execute Flink SQ
 in particular:
 
 - `flink-table-api-java-uber-{{< version >}}.jar` &#8594; contains all the 
Java APIs 
-- `flink-table-runtime-{{< version >}}.jar` &#8594; contains the runtime
+- `flink-table-runtime-{{< version >}}.jar` &#8594; contains the table runtime
 - `flink-table-planner-loader-{{< version >}}.jar` &#8594; contains the query 
planner
 
 **Note:** Previously, these JARs were all packaged into `flink-table.jar`. 
Since Flink 1.15, this has 

Review comment:
       I would replace the **Note:** with the `{{< hint warning >}}` syntax

##########
File path: docs/content/docs/dev/configuration/connector.md
##########
@@ -24,39 +24,42 @@ under the License.
 
 # Connectors and Formats
 
-Flink can read from and write to various external systems via connectors and 
define the format in 
-which to store the data.
+Flink can read from and write to various external systems via connectors and 
use the format of your choice
+in order to read/write data from/into records.

Review comment:
       ```suggestion
   in order to read/write data from/to records.
   ```

##########
File path: docs/content/docs/dev/configuration/connector.md
##########
@@ -24,39 +24,42 @@ under the License.
 
 # Connectors and Formats
 
-Flink can read from and write to various external systems via connectors and 
define the format in 
-which to store the data.
+Flink can read from and write to various external systems via connectors and 
use the format of your choice
+in order to read/write data from/into records.
 
-The way that information is serialized is represented in the external system 
and that system needs

Review comment:
       I agree with @matriv that it's better to leave this in because not 
everyone knows or understands this. I do think the sentence could be better, 
something like `Formats define how information is encoded for storage. Systems 
need to know how to read or write this data in a format that can be understood 
by Flink. This is done through format dependencies.`
   
    Currently these formats are supported:

##########
File path: docs/content/docs/dev/configuration/overview.md
##########
@@ -177,19 +177,46 @@ bash -c "$(curl 
https://flink.apache.org/q/gradle-quickstart.sh)" -- {{< version
 
 ## Which dependencies do you need?
 
-Depending on what you want to achieve, you are going to choose a combination 
of our available APIs, 
-which will require different dependencies. 
+To start working on a Flink job, you usually need the following dependencies:
+
+* Flink APIs, in order to develop your job
+* [Connectors and formats]({{< ref "docs/dev/configuration/connector" >}}), in 
order to integrate your job with external systems
+* [Testing utilities]({{< ref "docs/dev/configuration/testing" >}}), in order 
to test your job
+
+And in addition to these, you might want to add 3rd party dependencies that 
you need to develop custom functions.
+
+### Flink APIs
+
+Flink offers two major APIs: [Datastream API]({{< ref 
"docs/dev/datastream/overview" >}}) and [Table API & SQL]({{< ref 
"docs/dev/table/overview" >}}). 
+They can be used separately, or they can be mixed, depending on your use cases:
+
+| APIs you want to use                                                         
     | Dependency you need to add                          |
+|-----------------------------------------------------------------------------------|-----------------------------------------------------|
+| [DataStream]({{< ref "docs/dev/datastream/overview" >}})                     
     | `flink-streaming-java`                              |  
+| [DataStream with Scala]({{< ref "docs/dev/datastream/scala_api_extensions" 
>}})   | `flink-streaming-scala{{< scala_version >}}`        |   
+| [Table API]({{< ref "docs/dev/table/common" >}})                             
     | `flink-table-api-java`                              |   
+| [Table API with Scala]({{< ref "docs/dev/table/common" >}})                  
     | `flink-table-api-scala{{< scala_version >}}`        |
+| [Table API + DataStream]({{< ref "docs/dev/table/data_stream_api" >}})       
     | `flink-table-api-java-bridge`                       |
+| [Table API + DataStream with Scala]({{< ref "docs/dev/table/data_stream_api" 
>}}) | `flink-table-api-scala-bridge{{< scala_version >}}` |
+
+Just include them in your build tool script/descriptor, and you can start 
developing your job!
+
+## Running and packaging
+
+If you want to run your job by simply executing the main class, you will need 
`flink-runtime` in your classpath.
+In case of Table API programs, you will also need `flink-table-runtime` and 
`flink-table-planner-loader`.
 
-Here is a table of artifact/dependency names:
+As a rule of thumb, we **suggest** packaging the application code and all its 
required dependencies into one fat/uber JAR.

Review comment:
       Is this really the case? It also depends on the deployment mode 
(application, session, per job mode) right?

##########
File path: docs/content/docs/dev/configuration/testing.md
##########
@@ -26,65 +26,27 @@ under the License.
 
 Flink provides utilities for testing your job that you can add as dependencies.
 
-## DataStream API Test Dependencies
+## DataStream API Testing
 
-You need to add the following dependencies if you want to develop tests for a 
job built with the 
+You need to add the following dependencies if you want to develop tests for a 
job built with the
 DataStream API:
 
-{{< tabs "datastream test" >}}
+{{< artifact_tabs flink-test-utils withTestScope >}}
 
-{{< tab "Maven" >}}
-Open the `pom.xml` file in your project directory and add these dependencies 
in between the dependencies tab.
-{{< artifact flink-test-utils withTestScope >}}
-{{< artifact flink-runtime withTestScope >}}
-{{< /tab >}}
-
-{{< tab "Gradle" >}}
-Open the `build.gradle` file in your project directory and add the following 
in the dependencies block.
-```gradle
-...
-dependencies {
-    ...  
-    testImplementation "org.apache.flink:flink-test-utils:${flinkVersion}"
-    testImplementation "org.apache.flink:flink-runtime:${flinkVersion}"
-    ...
-}
-...
-```
-**Note:** This assumes that you have created your project using our Gradle 
build script or quickstart script.
-{{< /tab >}}
-
-{{< /tabs >}}
+Among the various test utilities, this module provides `MiniCluster`, a 
lightweight configurable Flink cluster runnable in a JUnit test that can 
directly execute jobs.

Review comment:
       I don't see this sentence when rendering the docs locally. 

##########
File path: docs/content/docs/dev/configuration/connector.md
##########
@@ -24,39 +24,42 @@ under the License.
 
 # Connectors and Formats
 
-Flink can read from and write to various external systems via connectors and 
define the format in 
-which to store the data.
+Flink can read from and write to various external systems via connectors and 
use the format of your choice
+in order to read/write data from/into records.
 
-The way that information is serialized is represented in the external system 
and that system needs
-to know how to read this data in a format that can be read by Flink.  This is 
done through format 
-dependencies.
+An overview of available connectors and formats is available for both
+[DataStream]({{< ref "docs/connectors/datastream/overview.md" >}}) and
+[Table API/SQL]({{< ref "docs/connectors/table/overview.md" >}}).
 
-Most applications need specific connectors to run. Flink provides a set of 
formats that can be used 
-with connectors (with the dependencies for both being fairly unified). These 
are not part of Flink's 
-core dependencies and must be added as dependencies to the application.
+In order to use connectors and formats, you need to make sure Flink has access 
to the artifacts implementing them. 

Review comment:
       This page now doesn't have any section, perhaps we could introduce one 
here? Something like access artifacts? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink] MartijnVisser commented on a change in pull request #18812: [FLINK-25129][docs] Improvements to the table-planner-loader related docs

Reply via email to