MartijnVisser commented on a change in pull request #18812:
URL: https://github.com/apache/flink/pull/18812#discussion_r809079539
##########
File path: docs/content/docs/dev/configuration/advanced.md
##########
@@ -24,33 +24,28 @@ under the License.
# Advanced Configuration Topics
-## Dependencies: Flink Core and User Application
-
-There are two broad categories of dependencies and libraries in Flink, which
are explained below.
-
-### Flink Core Dependencies
+## Anatomy of the Flink distribution
Flink itself consists of a set of classes and dependencies that form the core
of Flink's runtime
and must be present when a Flink application is started. The classes and
dependencies needed to run
the system handle areas such as coordination, networking, checkpointing,
failover, APIs,
operators (such as windowing), resource management, etc.
-These core classes and dependencies are packaged in the `flink-dist` jar, are
part of Flink's `lib`
-folder, and part of the basic Flink container images. You can think of these
dependencies as similar
-to Java's core library, which contains classes like `String` and `List`.
+These core classes and dependencies are packaged in the `flink-dist.jar`
available in the `/lib`
Review comment:
It seems like one or more words are missing here after mentioning the
JAR. Not sure which exactly
##########
File path: docs/content/docs/dev/configuration/advanced.md
##########
@@ -24,33 +24,28 @@ under the License.
# Advanced Configuration Topics
-## Dependencies: Flink Core and User Application
-
-There are two broad categories of dependencies and libraries in Flink, which
are explained below.
-
-### Flink Core Dependencies
+## Anatomy of the Flink distribution
Flink itself consists of a set of classes and dependencies that form the core
of Flink's runtime
and must be present when a Flink application is started. The classes and
dependencies needed to run
the system handle areas such as coordination, networking, checkpointing,
failover, APIs,
operators (such as windowing), resource management, etc.
-These core classes and dependencies are packaged in the `flink-dist` jar, are
part of Flink's `lib`
-folder, and part of the basic Flink container images. You can think of these
dependencies as similar
-to Java's core library, which contains classes like `String` and `List`.
+These core classes and dependencies are packaged in the `flink-dist.jar`
available in the `/lib`
+folder in the downloaded distribution, and part of the basic Flink container
images.
+You can think of these dependencies as similar to Java's core library, which
contains classes like `String` and `List`.
In order to keep the core dependencies as small as possible and avoid
dependency clashes, the
Flink Core Dependencies do not contain any connectors or libraries (i.e. CEP,
SQL, ML) in order to
avoid having an excessive default number of classes and dependencies in the
classpath.
-### User Application Dependencies
+The `/lib` directory of the Flink distribution additionally contains various
JARs including commonly used modules,
+such as all the required [modules to execute Table
jobs](#anatomy-of-table-dependencies) and a set of connector and formats.
Review comment:
I think that if we structure the documentation like this, we need to
have the overview for both the Table API jobs as well as DataStream API jobs.
##########
File path: docs/content/docs/dev/configuration/connector.md
##########
@@ -24,39 +24,42 @@ under the License.
# Connectors and Formats
-Flink can read from and write to various external systems via connectors and
define the format in
-which to store the data.
+Flink can read from and write to various external systems via connectors and
use the format of your choice
+in order to read/write data from/into records.
-The way that information is serialized is represented in the external system
and that system needs
-to know how to read this data in a format that can be read by Flink. This is
done through format
-dependencies.
+An overview of available connectors and formats is available for both
+[DataStream]({{< ref "docs/connectors/datastream/overview.md" >}}) and
+[Table API/SQL]({{< ref "docs/connectors/table/overview.md" >}}).
-Most applications need specific connectors to run. Flink provides a set of
formats that can be used
-with connectors (with the dependencies for both being fairly unified). These
are not part of Flink's
-core dependencies and must be added as dependencies to the application.
+In order to use connectors and formats, you need to make sure Flink has access
to the artifacts implementing them.
+For each connector supported by the Flink community, we publish on [Maven
Central](https://search.maven.org) two artifacts:
-## Adding Dependencies
+* `flink-connector-<NAME>` which is a thin JAR including only the connector
code, but excluding eventual 3rd party dependencies
Review comment:
I'm actually not 100% sure if this is a correct statement, but I'll have
to check with @fapaul on that
##########
File path: docs/content/docs/dev/configuration/connector.md
##########
@@ -24,39 +24,42 @@ under the License.
# Connectors and Formats
-Flink can read from and write to various external systems via connectors and
define the format in
-which to store the data.
+Flink can read from and write to various external systems via connectors and
use the format of your choice
+in order to read/write data from/into records.
-The way that information is serialized is represented in the external system
and that system needs
-to know how to read this data in a format that can be read by Flink. This is
done through format
-dependencies.
+An overview of available connectors and formats is available for both
+[DataStream]({{< ref "docs/connectors/datastream/overview.md" >}}) and
+[Table API/SQL]({{< ref "docs/connectors/table/overview.md" >}}).
-Most applications need specific connectors to run. Flink provides a set of
formats that can be used
-with connectors (with the dependencies for both being fairly unified). These
are not part of Flink's
-core dependencies and must be added as dependencies to the application.
+In order to use connectors and formats, you need to make sure Flink has access
to the artifacts implementing them.
+For each connector supported by the Flink community, we publish on [Maven
Central](https://search.maven.org) two artifacts:
-## Adding Dependencies
+* `flink-connector-<NAME>` which is a thin JAR including only the connector
code, but excluding eventual 3rd party dependencies
+* `flink-sql-connector-<NAME>` which is an uber JAR ready to use with all the
connector 3rd party dependencies.
-For more information on how to add dependencies, refer to the build tools
sections on [Maven]({{< ref "docs/dev/configuration/maven" >}})
-and [Gradle]({{< ref "docs/dev/configuration/gradle" >}}).
+The same applies for formats as well. Also note that some connectors, because
they don't require 3rd party dependencies,
+may not have a corresponding `flink-sql-connector-<NAME>` artifact.
-## Packaging Dependencies
+{{< hint info >}}
+The uber JARs are supported mostly for being used in conjunction with [SQL
client]({{< ref "docs/dev/table/sqlClient" >}}),
+but you can also use them in any DataStream/Table job.
Review comment:
```suggestion
but you can also use them in any DataStream/Table application.
```
Asking @fapaul to confirm that they can indeed be used in DataStream
applications.
##########
File path: docs/content/docs/dev/configuration/overview.md
##########
@@ -177,19 +177,46 @@ bash -c "$(curl
https://flink.apache.org/q/gradle-quickstart.sh)" -- {{< version
## Which dependencies do you need?
-Depending on what you want to achieve, you are going to choose a combination
of our available APIs,
-which will require different dependencies.
+To start working on a Flink job, you usually need the following dependencies:
+
+* Flink APIs, in order to develop your job
+* [Connectors and formats]({{< ref "docs/dev/configuration/connector" >}}), in
order to integrate your job with external systems
+* [Testing utilities]({{< ref "docs/dev/configuration/testing" >}}), in order
to test your job
+
+And in addition to these, you might want to add 3rd party dependencies that
you need to develop custom functions.
+
+### Flink APIs
+
+Flink offers two major APIs: [Datastream API]({{< ref
"docs/dev/datastream/overview" >}}) and [Table API & SQL]({{< ref
"docs/dev/table/overview" >}}).
+They can be used separately, or they can be mixed, depending on your use cases:
+
+| APIs you want to use
| Dependency you need to add |
+|-----------------------------------------------------------------------------------|-----------------------------------------------------|
+| [DataStream]({{< ref "docs/dev/datastream/overview" >}})
| `flink-streaming-java` |
+| [DataStream with Scala]({{< ref "docs/dev/datastream/scala_api_extensions"
>}}) | `flink-streaming-scala{{< scala_version >}}` |
+| [Table API]({{< ref "docs/dev/table/common" >}})
| `flink-table-api-java` |
+| [Table API with Scala]({{< ref "docs/dev/table/common" >}})
| `flink-table-api-scala{{< scala_version >}}` |
+| [Table API + DataStream]({{< ref "docs/dev/table/data_stream_api" >}})
| `flink-table-api-java-bridge` |
+| [Table API + DataStream with Scala]({{< ref "docs/dev/table/data_stream_api"
>}}) | `flink-table-api-scala-bridge{{< scala_version >}}` |
+
+Just include them in your build tool script/descriptor, and you can start
developing your job!
+
+## Running and packaging
+
+If you want to run your job by simply executing the main class, you will need
`flink-runtime` in your classpath.
+In case of Table API programs, you will also need `flink-table-runtime` and
`flink-table-planner-loader`.
-Here is a table of artifact/dependency names:
+As a rule of thumb, we **suggest** packaging the application code and all its
required dependencies into one fat/uber JAR.
+This includes packaging connectors, formats and every 3rd party dependencies
of your job.
+This rule **does not apply** to Java APIs, DataStream Scala APIs and eventual
aforementioned runtime modules,
+which are already provided by Flink itself and **must not** be included in a
job uber JAR.
+This job JAR can be submitted to an already running Flink cluster, or added to
a Flink application
+container image easily without modifying the distribution.
-| APIs you want to use | Dependency you need to add |
-|-----------------------------------|-------------------------------|
-| DataStream | flink-streaming-java |
-| DataStream with Scala | flink-streaming-scala{{< scala_version
>}} |
-| Table API | flink-table-api-java |
-| Table API with Scala | flink-table-api-scala{{< scala_version
>}} |
-| Table API + DataStream | flink-table-api-java-bridge |
-| Table API + DataStream with Scala | flink-table-api-scala-bridge{{<
scala_version >}} |
+## What's next?
-Check out the sections on [Datastream API]({{< ref
"docs/dev/datastream/overview" >}}) and
-[Table API & SQL]({{< ref "docs/dev/table/overview" >}}) to learn more.
+* To start developing your job, check out [DataStream API]({{< ref
"docs/dev/datastream/overview" >}}) and [Table API & SQL]({{< ref
"docs/dev/table/overview" >}}).
Review comment:
Really like this section 👍
##########
File path: docs/content/docs/dev/configuration/connector.md
##########
@@ -24,39 +24,42 @@ under the License.
# Connectors and Formats
-Flink can read from and write to various external systems via connectors and
define the format in
-which to store the data.
+Flink can read from and write to various external systems via connectors and
use the format of your choice
+in order to read/write data from/into records.
-The way that information is serialized is represented in the external system
and that system needs
-to know how to read this data in a format that can be read by Flink. This is
done through format
-dependencies.
+An overview of available connectors and formats is available for both
+[DataStream]({{< ref "docs/connectors/datastream/overview.md" >}}) and
+[Table API/SQL]({{< ref "docs/connectors/table/overview.md" >}}).
-Most applications need specific connectors to run. Flink provides a set of
formats that can be used
-with connectors (with the dependencies for both being fairly unified). These
are not part of Flink's
-core dependencies and must be added as dependencies to the application.
+In order to use connectors and formats, you need to make sure Flink has access
to the artifacts implementing them.
+For each connector supported by the Flink community, we publish on [Maven
Central](https://search.maven.org) two artifacts:
-## Adding Dependencies
+* `flink-connector-<NAME>` which is a thin JAR including only the connector
code, but excluding eventual 3rd party dependencies
+* `flink-sql-connector-<NAME>` which is an uber JAR ready to use with all the
connector 3rd party dependencies.
-For more information on how to add dependencies, refer to the build tools
sections on [Maven]({{< ref "docs/dev/configuration/maven" >}})
-and [Gradle]({{< ref "docs/dev/configuration/gradle" >}}).
+The same applies for formats as well. Also note that some connectors, because
they don't require 3rd party dependencies,
+may not have a corresponding `flink-sql-connector-<NAME>` artifact.
-## Packaging Dependencies
+{{< hint info >}}
+The uber JARs are supported mostly for being used in conjunction with [SQL
client]({{< ref "docs/dev/table/sqlClient" >}}),
+but you can also use them in any DataStream/Table job.
+{{< /hint >}}
-We recommend packaging the application code and all its required dependencies
into one fat/uber JAR.
-This job JAR can be submitted to an already running Flink cluster, or added to
a Flink application
-container image.
+In order to use a connector/format module, you can either:
Review comment:
This also feels like a natural place for another section, like "Using
artifacts"
##########
File path: docs/content/docs/dev/configuration/advanced.md
##########
@@ -102,20 +97,28 @@ For more details, check out how to [connect to external
systems]({{< ref "docs/c
Starting from Flink 1.15, the distribution contains two planners:
--`flink-table-planner{{< scala_version >}}-{{< version >}}.jar`, in `/opt`,
contains the query planner
--`flink-table-planner-loader-{{< version >}}.jar`, loaded by default in
`/lib`, contains the query planner
+- `flink-table-planner{{< scala_version >}}-{{< version >}}.jar`, in `/opt`,
contains the query planner
+- `flink-table-planner-loader-{{< version >}}.jar`, loaded by default in
`/lib`, contains the query planner
hidden behind an isolated classpath (you won't be able to address any
`io.apache.flink.table.planner` directly)
-The planners contain the same code, but they are packaged differently. In one
case, you must use the
+The two planner JARs contain the same code, but they are packaged differently.
In one case, you must use the
same Scala version of the JAR. In the other, you do not need to make
considerations about Scala, since
it is hidden inside the JAR.
Review comment:
```suggestion
The two planner JARs contain the same code, but they are packaged
differently. In the first case, you must use the
same Scala version of the JAR. In second case, you do not need to make
considerations about Scala, since
it is hidden inside the JAR.
```
##########
File path: docs/content/docs/dev/configuration/advanced.md
##########
@@ -84,7 +79,7 @@ The Flink distribution contains by default the required JARs
to execute Flink SQ
in particular:
- `flink-table-api-java-uber-{{< version >}}.jar` → contains all the
Java APIs
-- `flink-table-runtime-{{< version >}}.jar` → contains the runtime
+- `flink-table-runtime-{{< version >}}.jar` → contains the table runtime
- `flink-table-planner-loader-{{< version >}}.jar` → contains the query
planner
**Note:** Previously, these JARs were all packaged into `flink-table.jar`.
Since Flink 1.15, this has
Review comment:
I would replace the **Note:** with the `{{< hint warning >}}` syntax
##########
File path: docs/content/docs/dev/configuration/connector.md
##########
@@ -24,39 +24,42 @@ under the License.
# Connectors and Formats
-Flink can read from and write to various external systems via connectors and
define the format in
-which to store the data.
+Flink can read from and write to various external systems via connectors and
use the format of your choice
+in order to read/write data from/into records.
Review comment:
```suggestion
in order to read/write data from/to records.
```
##########
File path: docs/content/docs/dev/configuration/connector.md
##########
@@ -24,39 +24,42 @@ under the License.
# Connectors and Formats
-Flink can read from and write to various external systems via connectors and
define the format in
-which to store the data.
+Flink can read from and write to various external systems via connectors and
use the format of your choice
+in order to read/write data from/into records.
-The way that information is serialized is represented in the external system
and that system needs
Review comment:
I agree with @matriv that it's better to leave this in because not
everyone knows or understands this. I do think the sentence could be better,
something like `Formats define how information is encoded for storage. Systems
need to know how to read or write this data in a format that can be understood
by Flink. This is done through format dependencies.`
Currently these formats are supported:
##########
File path: docs/content/docs/dev/configuration/overview.md
##########
@@ -177,19 +177,46 @@ bash -c "$(curl
https://flink.apache.org/q/gradle-quickstart.sh)" -- {{< version
## Which dependencies do you need?
-Depending on what you want to achieve, you are going to choose a combination
of our available APIs,
-which will require different dependencies.
+To start working on a Flink job, you usually need the following dependencies:
+
+* Flink APIs, in order to develop your job
+* [Connectors and formats]({{< ref "docs/dev/configuration/connector" >}}), in
order to integrate your job with external systems
+* [Testing utilities]({{< ref "docs/dev/configuration/testing" >}}), in order
to test your job
+
+And in addition to these, you might want to add 3rd party dependencies that
you need to develop custom functions.
+
+### Flink APIs
+
+Flink offers two major APIs: [Datastream API]({{< ref
"docs/dev/datastream/overview" >}}) and [Table API & SQL]({{< ref
"docs/dev/table/overview" >}}).
+They can be used separately, or they can be mixed, depending on your use cases:
+
+| APIs you want to use
| Dependency you need to add |
+|-----------------------------------------------------------------------------------|-----------------------------------------------------|
+| [DataStream]({{< ref "docs/dev/datastream/overview" >}})
| `flink-streaming-java` |
+| [DataStream with Scala]({{< ref "docs/dev/datastream/scala_api_extensions"
>}}) | `flink-streaming-scala{{< scala_version >}}` |
+| [Table API]({{< ref "docs/dev/table/common" >}})
| `flink-table-api-java` |
+| [Table API with Scala]({{< ref "docs/dev/table/common" >}})
| `flink-table-api-scala{{< scala_version >}}` |
+| [Table API + DataStream]({{< ref "docs/dev/table/data_stream_api" >}})
| `flink-table-api-java-bridge` |
+| [Table API + DataStream with Scala]({{< ref "docs/dev/table/data_stream_api"
>}}) | `flink-table-api-scala-bridge{{< scala_version >}}` |
+
+Just include them in your build tool script/descriptor, and you can start
developing your job!
+
+## Running and packaging
+
+If you want to run your job by simply executing the main class, you will need
`flink-runtime` in your classpath.
+In case of Table API programs, you will also need `flink-table-runtime` and
`flink-table-planner-loader`.
-Here is a table of artifact/dependency names:
+As a rule of thumb, we **suggest** packaging the application code and all its
required dependencies into one fat/uber JAR.
Review comment:
Is this really the case? It also depends on the deployment mode
(application, session, per job mode) right?
##########
File path: docs/content/docs/dev/configuration/testing.md
##########
@@ -26,65 +26,27 @@ under the License.
Flink provides utilities for testing your job that you can add as dependencies.
-## DataStream API Test Dependencies
+## DataStream API Testing
-You need to add the following dependencies if you want to develop tests for a
job built with the
+You need to add the following dependencies if you want to develop tests for a
job built with the
DataStream API:
-{{< tabs "datastream test" >}}
+{{< artifact_tabs flink-test-utils withTestScope >}}
-{{< tab "Maven" >}}
-Open the `pom.xml` file in your project directory and add these dependencies
in between the dependencies tab.
-{{< artifact flink-test-utils withTestScope >}}
-{{< artifact flink-runtime withTestScope >}}
-{{< /tab >}}
-
-{{< tab "Gradle" >}}
-Open the `build.gradle` file in your project directory and add the following
in the dependencies block.
-```gradle
-...
-dependencies {
- ...
- testImplementation "org.apache.flink:flink-test-utils:${flinkVersion}"
- testImplementation "org.apache.flink:flink-runtime:${flinkVersion}"
- ...
-}
-...
-```
-**Note:** This assumes that you have created your project using our Gradle
build script or quickstart script.
-{{< /tab >}}
-
-{{< /tabs >}}
+Among the various test utilities, this module provides `MiniCluster`, a
lightweight configurable Flink cluster runnable in a JUnit test that can
directly execute jobs.
Review comment:
I don't see this sentence when rendering the docs locally.
##########
File path: docs/content/docs/dev/configuration/connector.md
##########
@@ -24,39 +24,42 @@ under the License.
# Connectors and Formats
-Flink can read from and write to various external systems via connectors and
define the format in
-which to store the data.
+Flink can read from and write to various external systems via connectors and
use the format of your choice
+in order to read/write data from/into records.
-The way that information is serialized is represented in the external system
and that system needs
-to know how to read this data in a format that can be read by Flink. This is
done through format
-dependencies.
+An overview of available connectors and formats is available for both
+[DataStream]({{< ref "docs/connectors/datastream/overview.md" >}}) and
+[Table API/SQL]({{< ref "docs/connectors/table/overview.md" >}}).
-Most applications need specific connectors to run. Flink provides a set of
formats that can be used
-with connectors (with the dependencies for both being fairly unified). These
are not part of Flink's
-core dependencies and must be added as dependencies to the application.
+In order to use connectors and formats, you need to make sure Flink has access
to the artifacts implementing them.
Review comment:
This page now doesn't have any section, perhaps we could introduce one
here? Something like access artifacts?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]