JingsongLi commented on a change in pull request #13010:
URL: https://github.com/apache/flink/pull/13010#discussion_r468310256
##########
File path: docs/dev/table/connectors/datagen.md
##########
@@ -29,25 +29,24 @@ under the License.
* This will be replaced by the TOC
{:toc}
-The DataGen connector allows for reading by data generation rules.
+The DataGen connector allows for creating tables based on in-memory data
generation.
+This is useful when developing queries locally without access to external
systems such as Kafka.
+Tables can include [Computed Column syntax]({% link dev/table/sql/create.md
%}#create-table) which allows for flexible record generation.
-The DataGen connector can work with [Computed Column syntax]({% link
dev/table/sql/create.md %}#create-table).
-This allows you to generate records flexibly.
+The DataGen connector is built-in, no additional dependencies are required.
-The DataGen connector is built-in.
+Usage
+-----
-<span class="label label-danger">Attention</span> Complex types are not
supported: Array, Map, Row. Please construct these types by computed column.
+By default, a DataGen table will create an unbounded number of rows with a
random value for each column.
+For variable sized types, char/varchar/string/array/map/multiset, the length
can be specified.
+Additionally, a total number of rows can be specified, resulting in a bounded
table.
-How to create a DataGen table
-----------------
-
-The boundedness of table: when the generation of field data in the table is
completed, the reading
-is finished. So the boundedness of the table depends on the boundedness of
fields.
-
-For each field, there are two ways to generate data:
+There also exists a sequence generator, where users specify a sequence of
start and end values.
+Complex types cannot be generated as a sequence.
+If any column in a table is a sequence type, the table will be bounded and end
with the first sequence completes.
-- Random generator is the default generator, you can specify random max and
min values. For char/varchar/string, the length can be specified. It is a
unbounded generator.
-- Sequence generator, you can specify sequence start and end values. It is a
bounded generator, when the sequence number reaches the end value, the reading
ends.
+Time types are always the local machines current system time.
Review comment:
Maybe we can have a table to show all types.
Display the generation strategies they support, and the required parameters?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]