zhangyue19921010 commented on a change in pull request #3413:
URL: https://github.com/apache/hudi/pull/3413#discussion_r713552896
##########
File path:
hudi-utilities/src/test/java/org/apache/hudi/utilities/functional/TestHoodieDeltaStreamer.java
##########
@@ -1398,6 +1399,34 @@ private void testParquetDFSSource(boolean
useSchemaProvider, List<String> transf
testNum++;
}
+ private void testORCDFSSource(boolean useSchemaProvider, List<String>
transformerClassNames) throws Exception {
+ // prepare ORCDFSSource
+ TypedProperties orcProps = new TypedProperties();
+
+ // Properties used for testing delta-streamer with orc source
+ orcProps.setProperty("include", "base.properties");
+ orcProps.setProperty("hoodie.embed.timeline.server","false");
+ orcProps.setProperty("hoodie.datasource.write.recordkey.field",
"_row_key");
+ orcProps.setProperty("hoodie.datasource.write.partitionpath.field",
"not_there");
+ if (useSchemaProvider) {
+
orcProps.setProperty("hoodie.deltastreamer.schemaprovider.source.schema.file",
dfsBasePath + "/" + "source.avsc");
+ if (transformerClassNames != null) {
+
orcProps.setProperty("hoodie.deltastreamer.schemaprovider.target.schema.file",
dfsBasePath + "/" + "target.avsc");
+ }
+ }
+ orcProps.setProperty("hoodie.deltastreamer.source.dfs.root",
ORC_SOURCE_ROOT);
+ UtilitiesTestBase.Helpers.savePropsToDFS(orcProps, dfs, dfsBasePath + "/"
+ PROPS_FILENAME_TEST_ORC);
+
+ String tableBasePath = dfsBasePath + "/test_orc_source_table" + testNum;
+ HoodieDeltaStreamer deltaStreamer = new HoodieDeltaStreamer(
+ TestHelpers.makeConfig(tableBasePath, WriteOperationType.INSERT,
ORCDFSSource.class.getName(),
+ transformerClassNames, PROPS_FILENAME_TEST_ORC, false,
+ useSchemaProvider, 100000, false, null, null, "timestamp",
null), jsc);
+ deltaStreamer.sync();
+ TestHelpers.assertRecordCount(ORC_NUM_RECORDS, tableBasePath +
"/*/*.parquet", sqlContext);
Review comment:
Hi @nsivabalan Thanks for your review. I think this is .parquet Because
this patch is a ORCDFSSource which let HoodieDeltaStreamer can read orc file
into hudi table and also use parquet format as base file format. So that we
need to use .parquet when reading hudi table data.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]