WencongLiu commented on code in PR #23362: URL: https://github.com/apache/flink/pull/23362#discussion_r1357718013
########## docs/content/docs/dev/datastream/dataset_migration.md: ########## @@ -0,0 +1,699 @@ +--- +title: "How To Migrate From DataSet to DataStream" +weight: 302 +type: docs +bookToc: false +aliases: + - /dev/dataset_migration.html +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# How to Migrate from DataSet to DataStream + +The DataSet API has been formally deprecated and will no longer receive active maintenance and support. It will be removed in the +Flink 2.0 version. Flink users are recommended to migrate from the DataSet API to the DataStream API, Table API and SQL for their +data processing requirements. + +For the most of DataSet APIs, the users can utilize the DataStream API to get the same calculation result in the batch jobs. However, +different DataSet API can be implemented by DataStream API with various difference on semantic and behavior. All DataSet APIs can be +categorized into four types: + +Category 1: These DataSet APIs can be implemented by DataStream APIs with same semantic and same calculation behavior. + +Category 2: These DataSet APIs can be implemented by DataStream APIs with different semantic but same calculation behavior. This will +make the job code more complex. + +Category 3: These DataSet APIs can be implemented by DataStream APIs with different semantic and different calculation behavior. This +will involve additional computation and shuffle costs. + +Category 4: These DataSet APIs are not supported by DataStream APIs. + +The subsequent sections will first introduce how to set the execution environment and provide detailed explanations on how to implement +each category of DataSet APIs using the DataStream APIs, highlighting the specific considerations and challenges associated with each +category. + + +## Setting the execution environment + +To execute a DataSet pipeline by DataStream API, we should first start by replacing ExecutionEnvironment with StreamExecutionEnvironment. +{{< tabs executionenv >}} +{{< tab "DataSet">}} +```java +ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); Review Comment: The alternatives for`LocalEnvironment`, `CollectionEnvironment`, and `RemoteEnvironment` have been added. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org