[
https://issues.apache.org/jira/browse/HUDI-9095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Davis Zhang updated HUDI-9095:
------------------------------
Priority: Trivial (was: Major)
> scary warning with stacktrace when creating index / insert records
> ------------------------------------------------------------------
>
> Key: HUDI-9095
> URL: https://issues.apache.org/jira/browse/HUDI-9095
> Project: Apache Hudi
> Issue Type: Bug
> Affects Versions: 1.1.0
> Reporter: Davis Zhang
> Priority: Trivial
> Fix For: 1.1.0
>
>
> OSS bundle is built on top of git hash ec652268e5d
> Spark is built on top of 3.5 OSS.
> {code:java}
> ➜ ~ spark-sql \
> --jars
> ~/hudiBuilds/hudi-spark3.5-bundle_2.12/ec652268e5d/hudi-spark3.5-bundle_2.12-1.1.0-SNAPSHOT.jar
> \
> --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
> --conf
> 'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension' \
> --conf
> 'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog'
> \
> --conf 'spark.kryo.registrator=org.apache.spark.HoodieSparkKryoRegistrar'
> --conf 'spark.sql.catalogImplementation=in-memory' {code}
> the retry handler log warning with scary stacktraces
> {code:java}
> 25/03/03 09:47:25 WARN RetryHelper: Catch Exception for N/A, will retry after
> 1000 ms.
> org.apache.hudi.exception.HoodieIOException: Failed to create file
> file:/tmp/lakes/customers/.hoodie/metadata/column_stats/.hoodie_partition_metadata
> ... {code}
> Maybe we should avoid logging stack traces in such warning messages.
> Full rerpo
> {code:java}
> spark-sql (default)>
> > -- Create the customers table with record-level index
> and secondary index on email
> > CREATE TABLE customers (
> > customer_id INT,
> > customer_name STRING,
> > customer_email STRING,
> > registration_date DATE,
> > last_updated_ts BIGINT,
> > partition_key STRING
> > ) USING hudi
> > PARTITIONED BY (partition_key)
> > LOCATION 'file:///tmp/lakes/customers/'
> > TBLPROPERTIES (
> > 'type' = 'mor',
> > 'primaryKey' = 'customer_id',
> > 'preCombineField' = 'last_updated_ts',
> > 'hoodie.index.type' = 'RECORD_INDEX', -- Enable
> record-level index
> > 'hoodie.metadata.enable' = 'true'
> > );
> 25/03/03 09:47:19 WARN DFSPropertiesConfiguration: Properties file
> file:/etc/hudi/conf/hudi-defaults.conf not found. Ignoring to load props file
> 25/03/03 09:47:19 WARN DFSPropertiesConfiguration: Cannot find HUDI_CONF_DIR,
> please set it as the dir of hudi-defaults.conf
> 25/03/03 09:47:19 WARN TableSchemaResolver: Could not find any data file
> written for commit, so could not get schema for table
> file:/tmp/lakes/customers
> Time taken: 0.594 seconds
> spark-sql (default)>
> > -- Create the orders table with record-level index and
> secondary index on customer_email
> > CREATE TABLE orders (
> > order_id INT,
> > customer_email STRING,
> > product_id INT,
> > order_amount DOUBLE,
> > order_date DATE,
> > last_updated_ts BIGINT,
> > partition_key STRING
> > ) USING hudi
> > PARTITIONED BY (partition_key)
> > LOCATION 'file:///tmp/lakes/orders/'
> > TBLPROPERTIES (
> > 'type' = 'mor',
> > 'primaryKey' = 'order_id',
> > 'preCombineField' = 'last_updated_ts',
> > 'hoodie.index.type' = 'RECORD_INDEX', -- Enable
> record-level index
> > 'hoodie.metadata.enable' = 'true'
> > );
> 25/03/03 09:47:19 WARN TableSchemaResolver: Could not find any data file
> written for commit, so could not get schema for table file:/tmp/lakes/orders
> Time taken: 0.054 seconds
> spark-sql (default)>
> >
> > -- Insert sample data into customers table
> > INSERT INTO customers
> > SELECT 1 as customer_id, 'John Smith' as customer_name,
> '[email protected]' as customer_email,
> > CAST('2023-01-15' AS DATE) as registration_date,
> 1673740800 as last_updated_ts, 'p1' as partition_key
> > UNION ALL
> > SELECT 2, 'Jane Doe', '[email protected]',
> CAST('2023-02-10' AS DATE), 1676016000, 'p2' as partition_key
> > UNION ALL
> > SELECT 3, 'Bob Johnson', '[email protected]',
> CAST('2023-03-05' AS DATE), 1677974400, 'p3' as partition_key;
> 25/03/03 09:47:20 WARN TableSchemaResolver: Could not find any data file
> written for commit, so could not get schema for table
> file:/tmp/lakes/customers
> 25/03/03 09:47:20 WARN TableSchemaResolver: Could not find any data file
> written for commit, so could not get schema for table
> file:/tmp/lakes/customers
> 25/03/03 09:47:21 WARN MetricsConfig: Cannot locate configuration: tried
> hadoop-metrics2-hbase.properties,hadoop-metrics2.properties
> # WARNING: Unable to get Instrumentation. Dynamic Attach failed. You may add
> this JAR as -javaagent manually, or supply -Djdk.attach.allowAttachSelf
> # WARNING: Unable to attach Serviceability Agent. Unable to attach even with
> module exceptions: [org.apache.hudi.org.openjdk.jol.vm.sa.SASupportException:
> Sense failed., org.apache.hudi.org.openjdk.jol.vm.sa.SASupportException:
> Sense failed., org.apache.hudi.org.openjdk.jol.vm.sa.SASupportException:
> Sense failed.]
> 25/03/03 09:47:23 WARN SparkMetadataTableRecordIndex: Record index not
> initialized so falling back to GLOBAL_SIMPLE for tagging records
> 25/03/03 09:47:25 WARN RetryHelper: Catch Exception for N/A, will retry after
> 1000 ms.
> org.apache.hudi.exception.HoodieIOException: Failed to create file
> file:/tmp/lakes/customers/.hoodie/metadata/column_stats/.hoodie_partition_metadata
> at
> org.apache.hudi.storage.HoodieStorage.createImmutableFileInPath(HoodieStorage.java:353)
> at
> org.apache.hudi.storage.HoodieStorage.createImmutableFileInPath(HoodieStorage.java:312)
> at
> org.apache.hudi.common.model.HoodiePartitionMetadata.lambda$trySave$19fcee3a$1(HoodiePartitionMetadata.java:117)
> at org.apache.hudi.common.util.RetryHelper.start(RetryHelper.java:94)
> at org.apache.hudi.common.util.RetryHelper.start(RetryHelper.java:122)
> at
> org.apache.hudi.common.model.HoodiePartitionMetadata.trySave(HoodiePartitionMetadata.java:123)
> at
> org.apache.hudi.io.HoodieAppendHandle.init(HoodieAppendHandle.java:244)
> at
> org.apache.hudi.io.HoodieAppendHandle.doAppend(HoodieAppendHandle.java:475)
> at
> org.apache.hudi.table.action.deltacommit.BaseSparkDeltaCommitActionExecutor.handleUpdate(BaseSparkDeltaCommitActionExecutor.java:83)
> at
> org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:321)
> at
> org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$mapPartitionsAsRDD$a3ab3c4$1(BaseSparkCommitActionExecutor.java:261)
> at
> org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1(JavaRDDLike.scala:102)
> at
> org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1$adapted(JavaRDDLike.scala:102)
> at
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:910)
> at
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:910)
> at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
> at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
> at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:381)
> at
> org.apache.spark.storage.BlockManager.$anonfun$getOrElseUpdate$1(BlockManager.scala:1372)
> at
> org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1614)
> at
> org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1524)
> at
> org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1588)
> at
> org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1389)
> at
> org.apache.spark.storage.BlockManager.getOrElseUpdateRDDBlock(BlockManager.scala:1343)
> at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:379)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
> at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
> at
> org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
> at org.apache.spark.scheduler.Task.run(Task.scala:141)
> at
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
> at
> org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
> at
> org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
> at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: org.apache.hadoop.fs.FileAlreadyExistsException: File already
> exists:
> file:/tmp/lakes/customers/.hoodie/metadata/column_stats/.hoodie_partition_metadata
> at
> org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:421)
> at
> org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:459)
> at
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.<init>(ChecksumFileSystem.java:433)
> at
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:521)
> at
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:500)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1195)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1175)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1064)
> at
> org.apache.hudi.hadoop.fs.HoodieWrapperFileSystem.lambda$create$2(HoodieWrapperFileSystem.java:243)
> at
> org.apache.hudi.hadoop.fs.HoodieWrapperFileSystem.executeFuncWithTimeMetrics(HoodieWrapperFileSystem.java:118)
> at
> org.apache.hudi.hadoop.fs.HoodieWrapperFileSystem.create(HoodieWrapperFileSystem.java:242)
> at
> org.apache.hudi.storage.hadoop.HoodieHadoopStorage.create(HoodieHadoopStorage.java:129)
> at
> org.apache.hudi.storage.HoodieStorage.createImmutableFileInPath(HoodieStorage.java:348)
> ... 42 more
> Time taken: 6.489 seconds
> spark-sql (default)>
> > -- Insert sample data into orders table
> > INSERT INTO orders
> > SELECT 101 as order_id, '[email protected]' as
> customer_email, 1001 as product_id,
> > 99.99 as order_amount, CAST('2023-04-10' AS DATE)
> as order_date, 1681084800 as last_updated_ts, 'p1' as partition_key
> > UNION ALL
> > SELECT 102, '[email protected]', 1002, 149.99,
> CAST('2023-04-15' AS DATE), 1681516800, 'p2' as partition_key
> > UNION ALL
> > SELECT 103, '[email protected]', 1003, 29.99,
> CAST('2023-04-20' AS DATE), 1681948800, 'p3' as partition_key
> > UNION ALL
> > SELECT 104, '[email protected]', 1004, 199.99,
> CAST('2023-05-05' AS DATE), 1683244800, 'p1' as partition_key;
> 25/03/03 09:47:26 WARN TableSchemaResolver: Could not find any data file
> written for commit, so could not get schema for table file:/tmp/lakes/orders
> 25/03/03 09:47:26 WARN TableSchemaResolver: Could not find any data file
> written for commit, so could not get schema for table file:/tmp/lakes/orders
> 25/03/03 09:47:27 WARN SparkMetadataTableRecordIndex: Record index not
> initialized so falling back to GLOBAL_SIMPLE for tagging records
> Time taken: 2.316 seconds
> spark-sql (default)>
> > -- Update a record to demonstrate Hudi's update
> capability
> > INSERT INTO customers
> > SELECT 1 as customer_id, 'John Smith' as customer_name,
> '[email protected]' as customer_email,
> > CAST('2023-01-15' AS DATE) as registration_date,
> 1683504000 as last_updated_ts, 'p1' as partition_key;
> 25/03/03 09:47:29 WARN SparkMetadataTableRecordIndex: Record index not
> initialized so falling back to GLOBAL_SIMPLE for tagging records
> 25/03/03 09:47:29 WARN HoodieLogBlock: There are records without valid
> positions. Skip writing record positions to the block header.
> Time taken: 1.379 seconds
> spark-sql (default)> CREATE INDEX record_index ON customers (customer_id);
> 25/03/03 09:47:53 WARN HoodieWriteConfig: Embedded timeline server is
> disabled, fallback to use direct marker type for spark
> 25/03/03 09:47:53 WARN ScheduleIndexActionExecutor: Following partitions
> already exist or inflight: [column_stats, partition_stats, files]. Going to
> schedule indexing of only these partitions: [record_index]
> Time taken: 1.103 seconds
> spark-sql (default)> CREATE INDEX record_index ON orders (order_id);
> 25/03/03 09:48:12 WARN HoodieWriteConfig: Embedded timeline server is
> disabled, fallback to use direct marker type for spark
> 25/03/03 09:48:12 WARN ScheduleIndexActionExecutor: Following partitions
> already exist or inflight: [column_stats, partition_stats, files]. Going to
> schedule indexing of only these partitions: [record_index]
> Time taken: 1.032 seconds
> spark-sql (default)> CREATE INDEX idx_email ON customers (customer_email);
> 25/03/03 09:48:25 WARN HoodieWriteConfig: Embedded timeline server is
> disabled, fallback to use direct marker type for spark
> 25/03/03 09:48:25 WARN ScheduleIndexActionExecutor: Following partitions
> already exist or inflight: [record_index, column_stats, partition_stats,
> files]. Going to schedule indexing of only these partitions:
> [secondary_index_idx_email, secondary_index_]
> 25/03/03 09:48:25 WARN RetryHelper: Catch Exception for N/A, will retry after
> 1000 ms.
> org.apache.hudi.exception.HoodieIOException: Failed to create file
> file:/tmp/lakes/customers/.hoodie/metadata/secondary_index_idx_email/.hoodie_partition_metadata
> at
> org.apache.hudi.storage.HoodieStorage.createImmutableFileInPath(HoodieStorage.java:353)
> at
> org.apache.hudi.storage.HoodieStorage.createImmutableFileInPath(HoodieStorage.java:312)
> at
> org.apache.hudi.common.model.HoodiePartitionMetadata.lambda$trySave$19fcee3a$1(HoodiePartitionMetadata.java:117)
> at org.apache.hudi.common.util.RetryHelper.start(RetryHelper.java:94)
> at org.apache.hudi.common.util.RetryHelper.start(RetryHelper.java:122)
> at
> org.apache.hudi.common.model.HoodiePartitionMetadata.trySave(HoodiePartitionMetadata.java:123)
> at
> org.apache.hudi.io.HoodieCreateHandle.<init>(HoodieCreateHandle.java:102)
> at
> org.apache.hudi.io.HoodieCreateHandle.<init>(HoodieCreateHandle.java:75)
> at
> org.apache.hudi.io.CreateHandleFactory.create(CreateHandleFactory.java:45)
> at
> org.apache.hudi.execution.CopyOnWriteInsertHandler.consume(CopyOnWriteInsertHandler.java:102)
> at
> org.apache.hudi.execution.CopyOnWriteInsertHandler.consume(CopyOnWriteInsertHandler.java:45)
> at
> org.apache.hudi.common.util.queue.SimpleExecutor.execute(SimpleExecutor.java:69)
> at
> org.apache.hudi.execution.SparkLazyInsertIterable.computeNext(SparkLazyInsertIterable.java:79)
> at
> org.apache.hudi.execution.SparkLazyInsertIterable.computeNext(SparkLazyInsertIterable.java:37)
> at
> org.apache.hudi.client.utils.LazyIterableIterator.next(LazyIterableIterator.java:121)
> at
> scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:46)
> at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492)
> at
> org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:223)
> at
> org.apache.spark.storage.memory.MemoryStore.putIteratorAsBytes(MemoryStore.scala:352)
> at
> org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1614)
> at
> org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1524)
> at
> org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1588)
> at
> org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1389)
> at
> org.apache.spark.storage.BlockManager.getOrElseUpdateRDDBlock(BlockManager.scala:1343)
> at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:379)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
> at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
> at
> org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
> at org.apache.spark.scheduler.Task.run(Task.scala:141)
> at
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
> at
> org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
> at
> org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
> at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: org.apache.hadoop.fs.FileAlreadyExistsException: File already
> exists:
> file:/tmp/lakes/customers/.hoodie/metadata/secondary_index_idx_email/.hoodie_partition_metadata
> at
> org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:421)
> at
> org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:459)
> at
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.<init>(ChecksumFileSystem.java:433)
> at
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:521)
> at
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:500)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1195)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1175)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1064)
> at
> org.apache.hudi.hadoop.fs.HoodieWrapperFileSystem.lambda$create$2(HoodieWrapperFileSystem.java:243)
> at
> org.apache.hudi.hadoop.fs.HoodieWrapperFileSystem.executeFuncWithTimeMetrics(HoodieWrapperFileSystem.java:118)
> at
> org.apache.hudi.hadoop.fs.HoodieWrapperFileSystem.create(HoodieWrapperFileSystem.java:242)
> at
> org.apache.hudi.storage.hadoop.HoodieHadoopStorage.create(HoodieHadoopStorage.java:129)
> at
> org.apache.hudi.storage.HoodieStorage.createImmutableFileInPath(HoodieStorage.java:348)
> ... 40 more
> 25/03/03 09:48:25 WARN RetryHelper: Catch Exception for N/A, will retry after
> 1000 ms.
> org.apache.hudi.exception.HoodieIOException: Failed to create file
> file:/tmp/lakes/customers/.hoodie/metadata/secondary_index_idx_email/.hoodie_partition_metadata
> at
> org.apache.hudi.storage.HoodieStorage.createImmutableFileInPath(HoodieStorage.java:353)
> at
> org.apache.hudi.storage.HoodieStorage.createImmutableFileInPath(HoodieStorage.java:312)
> at
> org.apache.hudi.common.model.HoodiePartitionMetadata.lambda$trySave$19fcee3a$1(HoodiePartitionMetadata.java:117)
> at org.apache.hudi.common.util.RetryHelper.start(RetryHelper.java:94)
> at org.apache.hudi.common.util.RetryHelper.start(RetryHelper.java:122)
> at
> org.apache.hudi.common.model.HoodiePartitionMetadata.trySave(HoodiePartitionMetadata.java:123)
> at
> org.apache.hudi.io.HoodieCreateHandle.<init>(HoodieCreateHandle.java:102)
> at
> org.apache.hudi.io.HoodieCreateHandle.<init>(HoodieCreateHandle.java:75)
> at
> org.apache.hudi.io.CreateHandleFactory.create(CreateHandleFactory.java:45)
> at
> org.apache.hudi.execution.CopyOnWriteInsertHandler.consume(CopyOnWriteInsertHandler.java:102)
> at
> org.apache.hudi.execution.CopyOnWriteInsertHandler.consume(CopyOnWriteInsertHandler.java:45)
> at
> org.apache.hudi.common.util.queue.SimpleExecutor.execute(SimpleExecutor.java:69)
> at
> org.apache.hudi.execution.SparkLazyInsertIterable.computeNext(SparkLazyInsertIterable.java:79)
> at
> org.apache.hudi.execution.SparkLazyInsertIterable.computeNext(SparkLazyInsertIterable.java:37)
> at
> org.apache.hudi.client.utils.LazyIterableIterator.next(LazyIterableIterator.java:121)
> at
> scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:46)
> at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492)
> at
> org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:223)
> at
> org.apache.spark.storage.memory.MemoryStore.putIteratorAsBytes(MemoryStore.scala:352)
> at
> org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1614)
> at
> org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1524)
> at
> org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1588)
> at
> org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1389)
> at
> org.apache.spark.storage.BlockManager.getOrElseUpdateRDDBlock(BlockManager.scala:1343)
> at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:379)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
> at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
> at
> org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
> at org.apache.spark.scheduler.Task.run(Task.scala:141)
> at
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
> at
> org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
> at
> org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
> at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: org.apache.hadoop.fs.FileAlreadyExistsException: File already
> exists:
> file:/tmp/lakes/customers/.hoodie/metadata/secondary_index_idx_email/.hoodie_partition_metadata
> at
> org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:421)
> at
> org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:459)
> at
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.<init>(ChecksumFileSystem.java:433)
> at
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:521)
> at
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:500)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1195)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1175)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1064)
> at
> org.apache.hudi.hadoop.fs.HoodieWrapperFileSystem.lambda$create$2(HoodieWrapperFileSystem.java:243)
> at
> org.apache.hudi.hadoop.fs.HoodieWrapperFileSystem.executeFuncWithTimeMetrics(HoodieWrapperFileSystem.java:118)
> at
> org.apache.hudi.hadoop.fs.HoodieWrapperFileSystem.create(HoodieWrapperFileSystem.java:242)
> at
> org.apache.hudi.storage.hadoop.HoodieHadoopStorage.create(HoodieHadoopStorage.java:129)
> at
> org.apache.hudi.storage.HoodieStorage.createImmutableFileInPath(HoodieStorage.java:348)
> ... 40 more
> Time taken: 2.09 seconds
> spark-sql (default)> CREATE INDEX idx_email ON orders (customer_email);
> 25/03/03 09:48:41 WARN HoodieWriteConfig: Embedded timeline server is
> disabled, fallback to use direct marker type for spark
> 25/03/03 09:48:41 WARN ScheduleIndexActionExecutor: Following partitions
> already exist or inflight: [record_index, column_stats, partition_stats,
> files]. Going to schedule indexing of only these partitions:
> [secondary_index_idx_email, secondary_index_]
> Time taken: 0.962 seconds
> spark-sql (default)> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)