kbendick commented on a change in pull request #1515:
URL: https://github.com/apache/iceberg/pull/1515#discussion_r495527775
##########
File path: flink/src/main/java/org/apache/iceberg/flink/sink/FlinkSink.java
##########
@@ -153,6 +158,11 @@ public Builder hadoopConf(Configuration newHadoopConf) {
return this;
}
+ public Builder flinkConf(org.apache.flink.configuration.Configuration
config) {
+ this.flinkConf = config != null ? config : new
org.apache.flink.configuration.Configuration();
+ return this;
+ }
Review comment:
cc @JingsongLi @openinx
##########
File path: flink/src/main/java/org/apache/iceberg/flink/sink/FlinkSink.java
##########
@@ -153,6 +158,11 @@ public Builder hadoopConf(Configuration newHadoopConf) {
return this;
}
+ public Builder flinkConf(org.apache.flink.configuration.Configuration
config) {
+ this.flinkConf = config != null ? config : new
org.apache.flink.configuration.Configuration();
+ return this;
+ }
Review comment:
By allowing user's to pass in `null` flink configuration objects by
silently instantiating a new one, are we making it harder for developers to
tell if their configuration they've set up elsewhere is being respected? For
example, if at the beginning of my program I used `ParamaterTool` somehow to
get my configuration - e.g. reading from a file via
`ParameterTool.fromPropertiesFile(file)`, or possibly from CLI args via
`ParameterTool.fromArgs(args)`, or one of the several ways to parse
configuration properties and parameters view `ParameterToool`, would this
instantiation of a new config pick up all of these changes?
If not, I think we should disallow null here - it really doesn't make sense
to me to call this method with null. In my opinion, calling a builder with null
for the configuration option is typically a bug. And it seems like a potential
cause of difficult to track bugs for people who are less aware of the
complexities of Flink's configuration object and the various ways it can be
derived (via `set` in the SQL cli, via `ParameterTool`, via `flink-conf.yaml`,
via table catalog properties, etc).
However, it's possible I'm mistaken and that by this point in generation of
the job graph that new Configuration will somehow have hooked into everything
that's needed to get the data however the user intended. And I will admit it's
possible that I am mixing up `Configuration` with several related, but
separate, entities. If so, please let me know and I do apologize.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]