AHeise commented on a change in pull request #16932:
URL: https://github.com/apache/flink/pull/16932#discussion_r693955740



##########
File path: docs/content/docs/connectors/datastream/hybridsource.md
##########
@@ -0,0 +1,101 @@
+---
+title: Hybrid Source
+weight: 8
+type: docs
+aliases:
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Hybrid Source
+
+`HybridSource` is a source that contains a list of concrete sources.
+It solves the problem of sequentially reading input from heterogeneous sources 
to produce a single input stream.
+
+For example, a bootstrap use case may need to read several days worth of 
bounded input from S3 before continuing with the latest unbounded input from 
Kafka.
+`HybridSource` switches from `FileSource` to `KafkaSource` when the bounded 
file input finishes.
+
+Prior to `HybridSource`, it was necessary to create a topology with multiple 
sources and define a switching mechanism in user land, which leads to 
operational complexity and inefficiency.
+
+With `HybridSource` the multiple sources appear as a single source in the 
Flink job graph and from `DataStream` API perspective.
+
+For more background see 
[FLIP-150](https://cwiki.apache.org/confluence/display/FLINK/FLIP-150%3A+Introduce+Hybrid+Source)
+
+To use the connector, add the ```flink-connector-base``` dependency to your 
project:
+
+{{< artifact flink-connector-base >}}
+
+(Typically comes as transitive dependency with concrete sources.)
+
+## Start position for next source
+
+To arrange multiple sources in a `HybridSource` each source typically needs to 
be assigned a
+start and end position (end position for bounded input for all but the final 
source).
+Details depend on the specific source and the external storage systems.

Review comment:
       ```suggestion
   To arrange multiple sources in a `HybridSource`, all sources except the last 
one need to be bounded. Therefore, the sources typically need to be assigned a 
start and end position. The last source may be bounded in which case the 
`HybridSource` is bounded and unbounded otherwise.
   Details depend on the specific source and the external storage systems.
   ```

##########
File path: docs/content/docs/connectors/datastream/hybridsource.md
##########
@@ -0,0 +1,101 @@
+---
+title: Hybrid Source
+weight: 8
+type: docs
+aliases:
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Hybrid Source
+
+`HybridSource` is a source that contains a list of concrete sources.

Review comment:
       I'd not explicitly mention FLIP-27. Eventually source=FLIP-27, if we do 
a good enough job. But maybe we could link to 
https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/dev/datastream/sources/?

##########
File path: docs/content/docs/connectors/datastream/hybridsource.md
##########
@@ -0,0 +1,101 @@
+---
+title: Hybrid Source
+weight: 8
+type: docs
+aliases:
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Hybrid Source
+
+`HybridSource` is a source that contains a list of concrete sources.
+It solves the problem of sequentially reading input from heterogeneous sources 
to produce a single input stream.
+
+For example, a bootstrap use case may need to read several days worth of 
bounded input from S3 before continuing with the latest unbounded input from 
Kafka.
+`HybridSource` switches from `FileSource` to `KafkaSource` when the bounded 
file input finishes.
+
+Prior to `HybridSource`, it was necessary to create a topology with multiple 
sources and define a switching mechanism in user land, which leads to 
operational complexity and inefficiency.
+
+With `HybridSource` the multiple sources appear as a single source in the 
Flink job graph and from `DataStream` API perspective.
+
+For more background see 
[FLIP-150](https://cwiki.apache.org/confluence/display/FLINK/FLIP-150%3A+Introduce+Hybrid+Source)
+
+To use the connector, add the ```flink-connector-base``` dependency to your 
project:
+
+{{< artifact flink-connector-base >}}
+
+(Typically comes as transitive dependency with concrete sources.)
+
+## Start position for next source
+
+To arrange multiple sources in a `HybridSource` each source typically needs to 
be assigned a
+start and end position (end position for bounded input for all but the final 
source).
+Details depend on the specific source and the external storage systems.
+
+Here we cover the most basic and then a more complex scenario, following the 
File/Kafka example. 
+
+#### Fixed start position at graph construction time
+
+Example: Read till pre-determined switch time from files and then continue 
reading from Kafka.
+Each source covers an upfront known range and therefore the contained sources 
can be created upfront as if they were used directly:
+
+```java
+long switchTimestamp = t2; // derive from file input paths

Review comment:
       ```suggestion
   long switchTimestamp = ...; // derive from file input paths
   ```

##########
File path: docs/content/docs/connectors/datastream/hybridsource.md
##########
@@ -0,0 +1,101 @@
+---
+title: Hybrid Source
+weight: 8
+type: docs
+aliases:
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Hybrid Source
+
+`HybridSource` is a source that contains a list of concrete sources.
+It solves the problem of sequentially reading input from heterogeneous sources 
to produce a single input stream.
+
+For example, a bootstrap use case may need to read several days worth of 
bounded input from S3 before continuing with the latest unbounded input from 
Kafka.
+`HybridSource` switches from `FileSource` to `KafkaSource` when the bounded 
file input finishes.

Review comment:
       ```suggestion
   For example, a bootstrap use case may need to read several days worth of 
bounded input from S3 before continuing with the latest unbounded input from 
Kafka.
   `HybridSource` switches from `FileSource` to `KafkaSource` when the bounded 
file input finishes without  interrupting the application.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to