hudi-bot opened a new issue, #17374:
URL: https://github.com/apache/hudi/issues/17374
OSS bundle is built on top of git hash ec652268e5d
Spark is built on top of 3.5 OSS.
{code:java}
➜ ~ spark-sql \
--jars
~/hudiBuilds/hudi-spark3.5-bundle_2.12/ec652268e5d/hudi-spark3.5-bundle_2.12-1.1.0-SNAPSHOT.jar
\
--conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
--conf
'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension' \
--conf
'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog'
\
--conf 'spark.kryo.registrator=org.apache.spark.HoodieSparkKryoRegistrar'
--conf 'spark.sql.catalogImplementation=in-memory' {code}
the retry handler log warning with scary stacktraces
{code:java}
25/03/03 09:47:25 WARN RetryHelper: Catch Exception for N/A, will retry
after 1000 ms.
org.apache.hudi.exception.HoodieIOException: Failed to create file
file:/tmp/lakes/customers/.hoodie/metadata/column_stats/.hoodie_partition_metadata
... {code}
Maybe we should avoid logging stack traces in such warning messages.
Full rerpo
{code:java}
spark-sql (default)>
> -- Create the customers table with record-level index
and secondary index on email
> CREATE TABLE customers (
> customer_id INT,
> customer_name STRING,
> customer_email STRING,
> registration_date DATE,
> last_updated_ts BIGINT,
> partition_key STRING
> ) USING hudi
> PARTITIONED BY (partition_key)
> LOCATION 'file:///tmp/lakes/customers/'
> TBLPROPERTIES (
> 'type' = 'mor',
> 'primaryKey' = 'customer_id',
> 'preCombineField' = 'last_updated_ts',
> 'hoodie.index.type' = 'RECORD_INDEX', -- Enable
record-level index
> 'hoodie.metadata.enable' = 'true'
> );
25/03/03 09:47:19 WARN DFSPropertiesConfiguration: Properties file
file:/etc/hudi/conf/hudi-defaults.conf not found. Ignoring to load props file
25/03/03 09:47:19 WARN DFSPropertiesConfiguration: Cannot find
HUDI_CONF_DIR, please set it as the dir of hudi-defaults.conf
25/03/03 09:47:19 WARN TableSchemaResolver: Could not find any data file
written for commit, so could not get schema for table file:/tmp/lakes/customers
Time taken: 0.594 seconds
spark-sql (default)>
> -- Create the orders table with record-level index and
secondary index on customer_email
> CREATE TABLE orders (
> order_id INT,
> customer_email STRING,
> product_id INT,
> order_amount DOUBLE,
> order_date DATE,
> last_updated_ts BIGINT,
> partition_key STRING
> ) USING hudi
> PARTITIONED BY (partition_key)
> LOCATION 'file:///tmp/lakes/orders/'
> TBLPROPERTIES (
> 'type' = 'mor',
> 'primaryKey' = 'order_id',
> 'preCombineField' = 'last_updated_ts',
> 'hoodie.index.type' = 'RECORD_INDEX', -- Enable
record-level index
> 'hoodie.metadata.enable' = 'true'
> );
25/03/03 09:47:19 WARN TableSchemaResolver: Could not find any data file
written for commit, so could not get schema for table file:/tmp/lakes/orders
Time taken: 0.054 seconds
spark-sql (default)>
>
> -- Insert sample data into customers table
> INSERT INTO customers
> SELECT 1 as customer_id, 'John Smith' as customer_name,
'[email protected]' as customer_email,
> CAST('2023-01-15' AS DATE) as registration_date,
1673740800 as last_updated_ts, 'p1' as partition_key
> UNION ALL
> SELECT 2, 'Jane Doe', '[email protected]',
CAST('2023-02-10' AS DATE), 1676016000, 'p2' as partition_key
> UNION ALL
> SELECT 3, 'Bob Johnson', '[email protected]',
CAST('2023-03-05' AS DATE), 1677974400, 'p3' as partition_key;
25/03/03 09:47:20 WARN TableSchemaResolver: Could not find any data file
written for commit, so could not get schema for table file:/tmp/lakes/customers
25/03/03 09:47:20 WARN TableSchemaResolver: Could not find any data file
written for commit, so could not get schema for table file:/tmp/lakes/customers
25/03/03 09:47:21 WARN MetricsConfig: Cannot locate configuration: tried
hadoop-metrics2-hbase.properties,hadoop-metrics2.properties
# WARNING: Unable to get Instrumentation. Dynamic Attach failed. You may add
this JAR as -javaagent manually, or supply -Djdk.attach.allowAttachSelf
# WARNING: Unable to attach Serviceability Agent. Unable to attach even with
module exceptions: [org.apache.hudi.org.openjdk.jol.vm.sa.SASupportException:
Sense failed., org.apache.hudi.org.openjdk.jol.vm.sa.SASupportException: Sense
failed., org.apache.hudi.org.openjdk.jol.vm.sa.SASupportException: Sense
failed.]
25/03/03 09:47:23 WARN SparkMetadataTableRecordIndex: Record index not
initialized so falling back to GLOBAL_SIMPLE for tagging records
25/03/03 09:47:25 WARN RetryHelper: Catch Exception for N/A, will retry
after 1000 ms.
org.apache.hudi.exception.HoodieIOException: Failed to create file
file:/tmp/lakes/customers/.hoodie/metadata/column_stats/.hoodie_partition_metadata
at
org.apache.hudi.storage.HoodieStorage.createImmutableFileInPath(HoodieStorage.java:353)
at
org.apache.hudi.storage.HoodieStorage.createImmutableFileInPath(HoodieStorage.java:312)
at
org.apache.hudi.common.model.HoodiePartitionMetadata.lambda$trySave$19fcee3a$1(HoodiePartitionMetadata.java:117)
at org.apache.hudi.common.util.RetryHelper.start(RetryHelper.java:94)
at org.apache.hudi.common.util.RetryHelper.start(RetryHelper.java:122)
at
org.apache.hudi.common.model.HoodiePartitionMetadata.trySave(HoodiePartitionMetadata.java:123)
at
org.apache.hudi.io.HoodieAppendHandle.init(HoodieAppendHandle.java:244)
at
org.apache.hudi.io.HoodieAppendHandle.doAppend(HoodieAppendHandle.java:475)
at
org.apache.hudi.table.action.deltacommit.BaseSparkDeltaCommitActionExecutor.handleUpdate(BaseSparkDeltaCommitActionExecutor.java:83)
at
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:321)
at
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$mapPartitionsAsRDD$a3ab3c4$1(BaseSparkCommitActionExecutor.java:261)
at
org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1(JavaRDDLike.scala:102)
at
org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1$adapted(JavaRDDLike.scala:102)
at
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:910)
at
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:910)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:381)
at
org.apache.spark.storage.BlockManager.$anonfun$getOrElseUpdate$1(BlockManager.scala:1372)
at
org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1614)
at
org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1524)
at
org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1588)
at
org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1389)
at
org.apache.spark.storage.BlockManager.getOrElseUpdateRDDBlock(BlockManager.scala:1343)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:379)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
at
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
at org.apache.spark.scheduler.Task.run(Task.scala:141)
at
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
at
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
at
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.apache.hadoop.fs.FileAlreadyExistsException: File already
exists:
file:/tmp/lakes/customers/.hoodie/metadata/column_stats/.hoodie_partition_metadata
at
org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:421)
at
org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:459)
at
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.<init>(ChecksumFileSystem.java:433)
at
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:521)
at
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:500)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1195)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1175)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1064)
at
org.apache.hudi.hadoop.fs.HoodieWrapperFileSystem.lambda$create$2(HoodieWrapperFileSystem.java:243)
at
org.apache.hudi.hadoop.fs.HoodieWrapperFileSystem.executeFuncWithTimeMetrics(HoodieWrapperFileSystem.java:118)
at
org.apache.hudi.hadoop.fs.HoodieWrapperFileSystem.create(HoodieWrapperFileSystem.java:242)
at
org.apache.hudi.storage.hadoop.HoodieHadoopStorage.create(HoodieHadoopStorage.java:129)
at
org.apache.hudi.storage.HoodieStorage.createImmutableFileInPath(HoodieStorage.java:348)
... 42 more
Time taken: 6.489 seconds
spark-sql (default)>
> -- Insert sample data into orders table
> INSERT INTO orders
> SELECT 101 as order_id, '[email protected]' as
customer_email, 1001 as product_id,
> 99.99 as order_amount, CAST('2023-04-10' AS
DATE) as order_date, 1681084800 as last_updated_ts, 'p1' as partition_key
> UNION ALL
> SELECT 102, '[email protected]', 1002, 149.99,
CAST('2023-04-15' AS DATE), 1681516800, 'p2' as partition_key
> UNION ALL
> SELECT 103, '[email protected]', 1003, 29.99,
CAST('2023-04-20' AS DATE), 1681948800, 'p3' as partition_key
> UNION ALL
> SELECT 104, '[email protected]', 1004, 199.99,
CAST('2023-05-05' AS DATE), 1683244800, 'p1' as partition_key;
25/03/03 09:47:26 WARN TableSchemaResolver: Could not find any data file
written for commit, so could not get schema for table file:/tmp/lakes/orders
25/03/03 09:47:26 WARN TableSchemaResolver: Could not find any data file
written for commit, so could not get schema for table file:/tmp/lakes/orders
25/03/03 09:47:27 WARN SparkMetadataTableRecordIndex: Record index not
initialized so falling back to GLOBAL_SIMPLE for tagging records
Time taken: 2.316 seconds
spark-sql (default)>
> -- Update a record to demonstrate Hudi's update
capability
> INSERT INTO customers
> SELECT 1 as customer_id, 'John Smith' as customer_name,
'[email protected]' as customer_email,
> CAST('2023-01-15' AS DATE) as registration_date,
1683504000 as last_updated_ts, 'p1' as partition_key;
25/03/03 09:47:29 WARN SparkMetadataTableRecordIndex: Record index not
initialized so falling back to GLOBAL_SIMPLE for tagging records
25/03/03 09:47:29 WARN HoodieLogBlock: There are records without valid
positions. Skip writing record positions to the block header.
Time taken: 1.379 seconds
spark-sql (default)> CREATE INDEX record_index ON customers (customer_id);
25/03/03 09:47:53 WARN HoodieWriteConfig: Embedded timeline server is
disabled, fallback to use direct marker type for spark
25/03/03 09:47:53 WARN ScheduleIndexActionExecutor: Following partitions
already exist or inflight: [column_stats, partition_stats, files]. Going to
schedule indexing of only these partitions: [record_index]
Time taken: 1.103 seconds
spark-sql (default)> CREATE INDEX record_index ON orders (order_id);
25/03/03 09:48:12 WARN HoodieWriteConfig: Embedded timeline server is
disabled, fallback to use direct marker type for spark
25/03/03 09:48:12 WARN ScheduleIndexActionExecutor: Following partitions
already exist or inflight: [column_stats, partition_stats, files]. Going to
schedule indexing of only these partitions: [record_index]
Time taken: 1.032 seconds
spark-sql (default)> CREATE INDEX idx_email ON customers (customer_email);
25/03/03 09:48:25 WARN HoodieWriteConfig: Embedded timeline server is
disabled, fallback to use direct marker type for spark
25/03/03 09:48:25 WARN ScheduleIndexActionExecutor: Following partitions
already exist or inflight: [record_index, column_stats, partition_stats,
files]. Going to schedule indexing of only these partitions:
[secondary_index_idx_email, secondary_index_]
25/03/03 09:48:25 WARN RetryHelper: Catch Exception for N/A, will retry
after 1000 ms.
org.apache.hudi.exception.HoodieIOException: Failed to create file
file:/tmp/lakes/customers/.hoodie/metadata/secondary_index_idx_email/.hoodie_partition_metadata
at
org.apache.hudi.storage.HoodieStorage.createImmutableFileInPath(HoodieStorage.java:353)
at
org.apache.hudi.storage.HoodieStorage.createImmutableFileInPath(HoodieStorage.java:312)
at
org.apache.hudi.common.model.HoodiePartitionMetadata.lambda$trySave$19fcee3a$1(HoodiePartitionMetadata.java:117)
at org.apache.hudi.common.util.RetryHelper.start(RetryHelper.java:94)
at org.apache.hudi.common.util.RetryHelper.start(RetryHelper.java:122)
at
org.apache.hudi.common.model.HoodiePartitionMetadata.trySave(HoodiePartitionMetadata.java:123)
at
org.apache.hudi.io.HoodieCreateHandle.<init>(HoodieCreateHandle.java:102)
at
org.apache.hudi.io.HoodieCreateHandle.<init>(HoodieCreateHandle.java:75)
at
org.apache.hudi.io.CreateHandleFactory.create(CreateHandleFactory.java:45)
at
org.apache.hudi.execution.CopyOnWriteInsertHandler.consume(CopyOnWriteInsertHandler.java:102)
at
org.apache.hudi.execution.CopyOnWriteInsertHandler.consume(CopyOnWriteInsertHandler.java:45)
at
org.apache.hudi.common.util.queue.SimpleExecutor.execute(SimpleExecutor.java:69)
at
org.apache.hudi.execution.SparkLazyInsertIterable.computeNext(SparkLazyInsertIterable.java:79)
at
org.apache.hudi.execution.SparkLazyInsertIterable.computeNext(SparkLazyInsertIterable.java:37)
at
org.apache.hudi.client.utils.LazyIterableIterator.next(LazyIterableIterator.java:121)
at
scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:46)
at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492)
at
org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:223)
at
org.apache.spark.storage.memory.MemoryStore.putIteratorAsBytes(MemoryStore.scala:352)
at
org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1614)
at
org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1524)
at
org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1588)
at
org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1389)
at
org.apache.spark.storage.BlockManager.getOrElseUpdateRDDBlock(BlockManager.scala:1343)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:379)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
at
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
at org.apache.spark.scheduler.Task.run(Task.scala:141)
at
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
at
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
at
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.apache.hadoop.fs.FileAlreadyExistsException: File already
exists:
file:/tmp/lakes/customers/.hoodie/metadata/secondary_index_idx_email/.hoodie_partition_metadata
at
org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:421)
at
org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:459)
at
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.<init>(ChecksumFileSystem.java:433)
at
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:521)
at
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:500)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1195)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1175)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1064)
at
org.apache.hudi.hadoop.fs.HoodieWrapperFileSystem.lambda$create$2(HoodieWrapperFileSystem.java:243)
at
org.apache.hudi.hadoop.fs.HoodieWrapperFileSystem.executeFuncWithTimeMetrics(HoodieWrapperFileSystem.java:118)
at
org.apache.hudi.hadoop.fs.HoodieWrapperFileSystem.create(HoodieWrapperFileSystem.java:242)
at
org.apache.hudi.storage.hadoop.HoodieHadoopStorage.create(HoodieHadoopStorage.java:129)
at
org.apache.hudi.storage.HoodieStorage.createImmutableFileInPath(HoodieStorage.java:348)
... 40 more
25/03/03 09:48:25 WARN RetryHelper: Catch Exception for N/A, will retry
after 1000 ms.
org.apache.hudi.exception.HoodieIOException: Failed to create file
file:/tmp/lakes/customers/.hoodie/metadata/secondary_index_idx_email/.hoodie_partition_metadata
at
org.apache.hudi.storage.HoodieStorage.createImmutableFileInPath(HoodieStorage.java:353)
at
org.apache.hudi.storage.HoodieStorage.createImmutableFileInPath(HoodieStorage.java:312)
at
org.apache.hudi.common.model.HoodiePartitionMetadata.lambda$trySave$19fcee3a$1(HoodiePartitionMetadata.java:117)
at org.apache.hudi.common.util.RetryHelper.start(RetryHelper.java:94)
at org.apache.hudi.common.util.RetryHelper.start(RetryHelper.java:122)
at
org.apache.hudi.common.model.HoodiePartitionMetadata.trySave(HoodiePartitionMetadata.java:123)
at
org.apache.hudi.io.HoodieCreateHandle.<init>(HoodieCreateHandle.java:102)
at
org.apache.hudi.io.HoodieCreateHandle.<init>(HoodieCreateHandle.java:75)
at
org.apache.hudi.io.CreateHandleFactory.create(CreateHandleFactory.java:45)
at
org.apache.hudi.execution.CopyOnWriteInsertHandler.consume(CopyOnWriteInsertHandler.java:102)
at
org.apache.hudi.execution.CopyOnWriteInsertHandler.consume(CopyOnWriteInsertHandler.java:45)
at
org.apache.hudi.common.util.queue.SimpleExecutor.execute(SimpleExecutor.java:69)
at
org.apache.hudi.execution.SparkLazyInsertIterable.computeNext(SparkLazyInsertIterable.java:79)
at
org.apache.hudi.execution.SparkLazyInsertIterable.computeNext(SparkLazyInsertIterable.java:37)
at
org.apache.hudi.client.utils.LazyIterableIterator.next(LazyIterableIterator.java:121)
at
scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:46)
at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492)
at
org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:223)
at
org.apache.spark.storage.memory.MemoryStore.putIteratorAsBytes(MemoryStore.scala:352)
at
org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1614)
at
org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1524)
at
org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1588)
at
org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1389)
at
org.apache.spark.storage.BlockManager.getOrElseUpdateRDDBlock(BlockManager.scala:1343)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:379)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
at
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
at org.apache.spark.scheduler.Task.run(Task.scala:141)
at
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
at
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
at
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.apache.hadoop.fs.FileAlreadyExistsException: File already
exists:
file:/tmp/lakes/customers/.hoodie/metadata/secondary_index_idx_email/.hoodie_partition_metadata
at
org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:421)
at
org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:459)
at
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.<init>(ChecksumFileSystem.java:433)
at
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:521)
at
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:500)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1195)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1175)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1064)
at
org.apache.hudi.hadoop.fs.HoodieWrapperFileSystem.lambda$create$2(HoodieWrapperFileSystem.java:243)
at
org.apache.hudi.hadoop.fs.HoodieWrapperFileSystem.executeFuncWithTimeMetrics(HoodieWrapperFileSystem.java:118)
at
org.apache.hudi.hadoop.fs.HoodieWrapperFileSystem.create(HoodieWrapperFileSystem.java:242)
at
org.apache.hudi.storage.hadoop.HoodieHadoopStorage.create(HoodieHadoopStorage.java:129)
at
org.apache.hudi.storage.HoodieStorage.createImmutableFileInPath(HoodieStorage.java:348)
... 40 more
Time taken: 2.09 seconds
spark-sql (default)> CREATE INDEX idx_email ON orders (customer_email);
25/03/03 09:48:41 WARN HoodieWriteConfig: Embedded timeline server is
disabled, fallback to use direct marker type for spark
25/03/03 09:48:41 WARN ScheduleIndexActionExecutor: Following partitions
already exist or inflight: [record_index, column_stats, partition_stats,
files]. Going to schedule indexing of only these partitions:
[secondary_index_idx_email, secondary_index_]
Time taken: 0.962 seconds
spark-sql (default)> {code}
## JIRA info
- Link: https://issues.apache.org/jira/browse/HUDI-9095
- Type: Sub-task
- Parent: https://issues.apache.org/jira/browse/HUDI-9176
- Affects version(s):
- 1.1.0
- Fix version(s):
- 1.1.0
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]