harsh1231 commented on code in PR #9834: URL: https://github.com/apache/hudi/pull/9834#discussion_r1350521264
########## hudi-utilities/src/test/java/org/apache/hudi/utilities/deltastreamer/TestSourceFormatAdapter.java: ########## @@ -136,16 +146,23 @@ public void testJsonSanitization(String unsanitizedDataFile, String sanitizedDat public static class TestRowDataSource extends Source<Dataset<Row>> { private final InputBatch<Dataset<Row>> batch; + private final SanitizationUtils rowSanitizer; public TestRowDataSource(TypedProperties props, JavaSparkContext sparkContext, SparkSession sparkSession, SchemaProvider schemaProvider, InputBatch<Dataset<Row>> batch) { super(props, sparkContext, sparkSession, schemaProvider, SourceType.ROW); this.batch = batch; + this.rowSanitizer = new SanitizationUtils(); } @Override protected InputBatch<Dataset<Row>> fetchNewData(Option<String> lastCkptStr, long sourceLimit) { - return batch; + return batch.getBatch().map(dsr -> { + Dataset<Row> tranformed = SanitizationUtils.sanitizeColumnNamesForAvro(dsr, getInvalidCharMask(props)); Review Comment: We are bypassing RowSource in Unit test . SourceFormatAdapter is not doing any sanitization . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org