Thank Jane for providing examples to make discussions clearer.
Thank Lincoln and Xuyang for your feedback,I agree with you wholeheartedly that
it is better to throw an error instead of ignoring it directly.
Extending datagen to generate variable length values is really an excelent
idea, I will create another jira to follow up.
Taking the example provided,
1. For fixed-length data types (char, binary), two DDLs which custom length
should throw exception like 'User-defined length of the fixed-length field f0
is not supported.'
1.
CREATE TABLE foo (
f0 CHAR(5)
) WITH ('connector' = 'datagen', 'fields.f0.length' = '10');
CREATE TABLE bar (
f0 CHAR(5)
) WITH ('connector' = 'datagen', 'fields.f0.length' = '1');
1. For variable-length data types (varchar, varbinary),the first DDL can be
executed legally, if illegal user-defined length configured, will throw
exception like 'User-defined length of the VARCHAR field %s should be shorter
than the schema definition.'
1.
CREATE TABLE meow (
f0 VARCHAR(20)
) WITH ('connector' = 'datagen', 'fields.f0.length' = '10');
1. For special variable-length data types, since the length of String and
Bytes is very large (2^31 - 1), when users does not specify a smaller field
length, Fields that occupy a huge amount of memory (estimated to be more than
2GB) will be generated by default, which can easily lead to
"java.lang.OutOfMemoryError: Java heap space", so I recommend that the default
length of these two fields is 100 just like before, but the length can be
configured to less than 2^31-1.
1.
CREATE TABLE purr (
f0 STRING
) WITH ('connector' = 'datagen', 'fields.f0.length' = '10');
Updates have been synchronized to the merge request [1]
WDYT?
[1] https://github.com/apache/flink/pull/23678
Best!
Yubin