xushiyan commented on a change in pull request #2903:
URL: https://github.com/apache/hudi/pull/2903#discussion_r779342696
##########
File path:
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/HoodieSparkSqlWriterSuite.scala
##########
@@ -336,7 +337,42 @@ class HoodieSparkSqlWriterSuite extends FunSuite with
Matchers {
}
}
- test("test bulk insert dataset with datasource impl multiple rounds") {
+ test("test read of a table with one failed write") {
+ initSparkContext("test_read_table_with_one_failed_write")
+ val path = java.nio.file.Files.createTempDirectory("hoodie_test_path")
+ try {
+ val hoodieFooTableName = "hoodie_foo_tbl"
+ val fooTableModifier = Map("path" -> path.toAbsolutePath.toString,
+ HoodieWriteConfig.TABLE_NAME.key() -> hoodieFooTableName,
+ DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY.key() -> "_row_key",
+ DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY.key() ->
"partition")
+
+ val fooTableParams =
HoodieWriterUtils.parametersWithWriteDefaults(fooTableModifier)
+ val props = new Properties()
+ fooTableParams.foreach(entry => props.setProperty(entry._1, entry._2))
+ val metaClient =
HoodieTableMetaClient.initTableAndGetMetaClient(spark.sparkContext.hadoopConfiguration,
path.toAbsolutePath.toString, props)
+
+ val partitionAndFileId = new util.HashMap[String, String]()
+
partitionAndFileId.put(HoodieTestDataGenerator.DEFAULT_FIRST_PARTITION_PATH,
"file-1")
+
+
HoodieTestTable.of(metaClient).withPartitionMetaFiles(HoodieTestDataGenerator.DEFAULT_FIRST_PARTITION_PATH)
+ .addInflightCommit("001")
+ .withBaseFilesInPartitions(partitionAndFileId)
+
+ val snapshotDF1 = spark.read.format("org.apache.hudi")
+ .load(path.toAbsolutePath.toString + "/*/*/*/*")
+ snapshotDF1.count()
+ assertFalse(true)
+ } catch {
+ case e: InvalidTableException =>
+ assertTrue(e.getMessage.contains("Invalid Hoodie Table"))
Review comment:
Looks like we should fix `InvalidTableException` to say `Hudi` instead
of `Hoodie` since this is user-facing message.
```suggestion
assertTrue(e.getMessage.contains("Invalid Hudi Table"))
```
##########
File path:
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/DefaultSource.scala
##########
@@ -106,14 +106,16 @@ class DefaultSource extends RelationProvider
val metaClient =
HoodieTableMetaClient.builder().setConf(fs.getConf).setBasePath(tablePath).build()
val isBootstrappedTable =
metaClient.getTableConfig.getBootstrapBasePath.isPresent
val tableType = metaClient.getTableType
-
// First check if the ConfigUtils.IS_QUERY_AS_RO_TABLE has set by
HiveSyncTool,
// or else use query type from QUERY_TYPE_OPT_KEY.
val queryType = parameters.get(ConfigUtils.IS_QUERY_AS_RO_TABLE)
.map(is => if (is.toBoolean) QUERY_TYPE_READ_OPTIMIZED_OPT_VAL else
QUERY_TYPE_SNAPSHOT_OPT_VAL)
.getOrElse(parameters.getOrElse(QUERY_TYPE_OPT_KEY.key,
QUERY_TYPE_OPT_KEY.defaultValue()))
log.info(s"Is bootstrapped table => $isBootstrappedTable, tableType is:
$tableType, queryType is: $queryType")
+ if (metaClient.getCommitsTimeline.filterCompletedInstants.empty()) {
+ throw new InvalidTableException("No valid commits found in the given
path " + metaClient.getBasePath)
+ }
Review comment:
@nsivabalan since this is a validation check, can we move it to L107
just below `metaClient` creation?
##########
File path:
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/HoodieSparkSqlWriterSuite.scala
##########
@@ -336,7 +337,42 @@ class HoodieSparkSqlWriterSuite extends FunSuite with
Matchers {
}
}
- test("test bulk insert dataset with datasource impl multiple rounds") {
+ test("test read of a table with one failed write") {
Review comment:
don't think we need a functional test for this. we should be cognizant
about over-testing (causing long testing time and maintenance effort). The
simple logic flow can be properly covered by a UT on `createRelation()` to
assert an exception?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]