wangxianghu commented on a change in pull request #2963:
URL: https://github.com/apache/hudi/pull/2963#discussion_r645924017
##########
File path:
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/schema/SchemaProvider.java
##########
@@ -34,18 +32,9 @@
@PublicAPIClass(maturity = ApiMaturityLevel.STABLE)
public abstract class SchemaProvider implements Serializable {
- protected TypedProperties config;
+ protected Schema sourceSchema;
- protected JavaSparkContext jssc;
-
- public SchemaProvider(TypedProperties props) {
Review comment:
> just now read other comments. I understand the intent to make it
agnostic to engines, but not gonna be easy to make it backwards compatible.
>
> One more thought: we might need to make the base abstract class generic
with two types (a config class and engine context may be). But this def needs
more thought.
@nsivabalan thanks for the review.
I thought about introducing two var(config and engine context) in the base
abstract schema provider. but In the implementation process, I found that these
parameters have no effect in the abstract class, and they are not used
anywhere. that's because different implementations need different confs and
the way they use the confs differs too.
besides, considering different engine implementations, the different engine
may provide different types of configurations. For example, flink engine
provides `org.apache.flink.configuration.Configuration`, we have to convert
flink configuration to a common conf type(maybe `TypedProperties`) to adapt its
father class, but this conversion is useless because the common conf will never
be used.
So I left the base abstract `SchemaProvider` the way you see it.
##########
File path:
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/schema/SchemaProvider.java
##########
@@ -34,18 +32,9 @@
@PublicAPIClass(maturity = ApiMaturityLevel.STABLE)
public abstract class SchemaProvider implements Serializable {
- protected TypedProperties config;
+ protected Schema sourceSchema;
- protected JavaSparkContext jssc;
-
- public SchemaProvider(TypedProperties props) {
Review comment:
> Can we think about making this backwards compatible. If a user defined
their own schemaProvider, super(...) calls may start to fail if we remove these
2 constructors. If this was a private interface, we could evolve w/o much
consideration, but since this is a public api, we got to be careful.
@nsivabalan, Yes, we should be careful with an abstract class
in the current design, the user-defined schema provider does not need to
call the `super(...)`.since there are no confs used in abstract class.
maybe we can make the base `SchemaProvider` an interface by the way.
because its child classes have little in common
which one is better?
1. Make the `SchemaProvider` an interface
2. leave it as it is now, `SchemaProvider` has no common conf and context
3. add common conf and context to `SchemaProvier` to make the schema
providers more organized
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]