yihua commented on code in PR #14076:
URL: https://github.com/apache/hudi/pull/14076#discussion_r2467316562
##########
hudi-utilities/src/test/java/org/apache/hudi/utilities/sources/TestJsonDFSSource.java:
##########
@@ -61,6 +61,13 @@ public Source prepareDFSSource(TypedProperties props) {
return new JsonDFSSource(props, jsc, sparkSession, schemaProvider);
}
+ @Override
+ protected Option<TypedProperties> getSourceFormatAdapterProps() {
+ TypedProperties properties = new TypedProperties();
+
properties.setProperty(HoodieStreamerConfig.SANITIZE_SCHEMA_FIELD_NAMES.key(),
"true");
Review Comment:
Should we have a separate PR to try enabling `SANITIZE_SCHEMA_FIELD_NAMES`
by default?
##########
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/stats/SparkValueMetadataUtils.java:
##########
@@ -144,7 +144,7 @@ public static Comparable convertSparkToJava(ValueMetadata
valueMetadata, Object
* we need to return java.sql.Timestamp and java.sql.Date
*
*/
- public static Object convertJavaTypeToSparkType(Comparable<?> javaVal,
boolean useJava8api) {
+ public static Object convertJavaTypeToSparkType(Object javaVal, boolean
useJava8api) {
Review Comment:
nit: is this change needed, i.e., are some values not `Comparable` anymore?
##########
hudi-utilities/src/test/java/org/apache/hudi/utilities/testutils/sources/AbstractDFSSourceTestBase.java:
##########
@@ -136,6 +140,9 @@ public void testReadingFromSource() throws IOException {
.createDataFrame(JavaRDD.toRDD(fetch1.getBatch().get()),
schemaProvider.getSourceSchema().toString(), sparkSession);
assertEquals(100, fetch1Rows.count());
+ // city_to_state can't be in except because it is a map
+ assertEquals(0, fetch1AsRows.getBatch().get().drop("city_to_state")
+ .except(fetch1Rows.drop("city_to_state")).count());
Review Comment:
Is there validation on column values in the target Hudi table which uses
Json data as the source in a functional test?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]