rangareddy commented on issue #15035:
URL: https://github.com/apache/hudi/issues/15035#issuecomment-3658990459
This issue is reproducible with the provided code, resulting in the
following exception:
**Code:**
```
val tableName = "unicode_test"
val basePath = f"/tmp/$tableName"
val columns = Seq("id","name")
val data = Seq((1,"İ"));
var df = spark.createDataFrame(data).toDF(columns:_*)
df.write.format("hudi").
option("hoodie.datasource.write.recordkey.field", "id").
option("hoodie.datasource.write.partitionpath.field", "name").
option("hoodie.table.name", tableName).
mode("overwrite").
save(basePath)
spark.read.format("hudi").load(basePath).show()
df.write.format("hudi").
option("hoodie.datasource.write.recordkey.field", "id").
option("hoodie.datasource.write.partitionpath.field", "name").
option("hoodie.table.name", tableName).
mode("append").
save(basePath)
```
**Exception:**
<details>
<summary>java.lang.UnsupportedOperationException: The format for file
/tmp/unicode_test is not supported yet.</summary>
```scala
25/12/16 06:00:01 WARN WriteMarkersFactory: Timeline-server-based markers
are not supported for HDFS: base path /tmp/unicode_test. Falling back to
direct markers.
25/12/16 06:00:01 ERROR Executor: Exception in task 0.0 in stage 58.0 (TID
64)
java.lang.UnsupportedOperationException: The format for file
/tmp/unicode_test is not supported yet.
at
org.apache.hudi.io.storage.HoodieIOFactory.getFileFormatUtils(HoodieIOFactory.java:116)
~[org.apache.hudi_hudi-spark3.5-bundle_2.12-1.1.0.jar:1.1.0]
at
org.apache.hudi.io.HoodieKeyLocationFetchHandle.fetchRecordKeysWithPositions(HoodieKeyLocationFetchHandle.java:54)
~[org.apache.hudi_hudi-spark3.5-bundle_2.12-1.1.0.jar:1.1.0]
at
org.apache.hudi.io.HoodieKeyLocationFetchHandle.locations(HoodieKeyLocationFetchHandle.java:62)
~[org.apache.hudi_hudi-spark3.5-bundle_2.12-1.1.0.jar:1.1.0]
at
org.apache.hudi.index.simple.HoodieSimpleIndex.lambda$fetchRecordLocations$1693a449$1(HoodieSimpleIndex.java:149)
~[org.apache.hudi_hudi-spark3.5-bundle_2.12-1.1.0.jar:1.1.0]
at
org.apache.hudi.data.HoodieJavaRDD.lambda$flatMap$a6598fcb$1(HoodieJavaRDD.java:165)
~[org.apache.hudi_hudi-spark3.5-bundle_2.12-1.1.0.jar:1.1.0]
at
org.apache.spark.api.java.JavaRDDLike.$anonfun$flatMap$1(JavaRDDLike.scala:125)
~[spark-core_2.12-3.5.3.jar:3.5.3]
at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486)
~[scala-library-2.12.18.jar:?]
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492)
~[scala-library-2.12.18.jar:?]
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
~[scala-library-2.12.18.jar:?]
at
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:140)
~[spark-core_2.12-3.5.3.jar:3.5.3]
at
org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
~[spark-core_2.12-3.5.3.jar:3.5.3]
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:104)
~[spark-core_2.12-3.5.3.jar:3.5.3]
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54)
~[spark-core_2.12-3.5.3.jar:3.5.3]
at
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
~[spark-core_2.12-3.5.3.jar:3.5.3]
at org.apache.spark.scheduler.Task.run(Task.scala:141)
~[spark-core_2.12-3.5.3.jar:3.5.3]
at
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
~[spark-core_2.12-3.5.3.jar:3.5.3]
at
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
~[spark-common-utils_2.12-3.5.3.jar:3.5.3]
at
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
~[spark-common-utils_2.12-3.5.3.jar:3.5.3]
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
~[spark-core_2.12-3.5.3.jar:3.5.3]
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
[spark-core_2.12-3.5.3.jar:3.5.3]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[?:1.8.0_342]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[?:1.8.0_342]
at java.lang.Thread.run(Thread.java:750) [?:1.8.0_342]
25/12/16 06:00:01 WARN TaskSetManager: Lost task 0.0 in stage 58.0 (TID 64)
(adhoc-1 executor driver): java.lang.UnsupportedOperationException: The format
for file /tmp/unicode_test is not supported yet.
at
org.apache.hudi.io.storage.HoodieIOFactory.getFileFormatUtils(HoodieIOFactory.java:116)
at
org.apache.hudi.io.HoodieKeyLocationFetchHandle.fetchRecordKeysWithPositions(HoodieKeyLocationFetchHandle.java:54)
at
org.apache.hudi.io.HoodieKeyLocationFetchHandle.locations(HoodieKeyLocationFetchHandle.java:62)
at
org.apache.hudi.index.simple.HoodieSimpleIndex.lambda$fetchRecordLocations$1693a449$1(HoodieSimpleIndex.java:149)
at
org.apache.hudi.data.HoodieJavaRDD.lambda$flatMap$a6598fcb$1(HoodieJavaRDD.java:165)
at
org.apache.spark.api.java.JavaRDDLike.$anonfun$flatMap$1(JavaRDDLike.scala:125)
at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
at
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:140)
at
org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:104)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54)
at
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
at org.apache.spark.scheduler.Task.run(Task.scala:141)
at
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
at
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
at
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
```
</details>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]