trushev commented on code in PR #7895:
URL: https://github.com/apache/hudi/pull/7895#discussion_r1122654358
##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala:
##########
@@ -589,11 +589,11 @@ object HoodieSparkSqlWriter {
if (isCompatibleProjectionOf(tableSchema, newSchema)) {
// Picking table schema as a writer schema we need to validate that we'd
be able to
// rewrite incoming batch's data (written in new schema) into it
- (tableSchema, isSchemaCompatible(newSchema, tableSchema, true))
Review Comment:
I think we do. The check in `#deduceWriterSchema` is performed at creating
rdd stage while `#validateSchema` is part of writing stage. The first check
allows us to identify schema incompatibility in advance
##########
hudi-common/src/main/java/org/apache/hudi/avro/AvroSchemaUtils.java:
##########
@@ -76,7 +87,18 @@ public static boolean isSchemaCompatible(Schema prevSchema,
Schema newSchema, bo
* @return true if prev schema is a projection of new schema.
*/
public static boolean canProject(Schema prevSchema, Schema newSchema) {
+ return canProject(prevSchema, newSchema, Collections.emptySet());
+ }
+
+ /**
+ * Check that each field in the prevSchema can be populated in the newSchema
except specified columns
+ * @param prevSchema prev schema.
+ * @param newSchema new schema
+ * @return true if prev schema is a projection of new schema.
+ */
+ public static boolean canProject(Schema prevSchema, Schema newSchema,
Collection<String> exceptCols) {
Review Comment:
done
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]