Hello,
Please let me know if this is not the correct forum, I’m happy to post it
elsewhere.
I’m trying to use XTable to convert a hudi source to a delta target and I am
receiving the following exception. The table is active and frequently updated.
It is being actively queried as a hudi table.
Is there any other debug information I can provide to make this more useful?
My git head is 4a96627a
OS is Linux/Ubuntu
Java 11
Modified log4j2.xml to set level=trace for org.apache.hudi, o.a.xtable
Run with stacktrace:
$ java -jar
./xtable-utilities/target/xtable-utilities-0.1.0-SNAPSHOT-bundled.jar
--datasetConfig config.yaml
WARNING: Runtime environment or build system does not support multi-release
JARs. This will impact location-based features.
2024-06-05 23:22:05 INFO org.apache.xtable.utilities.RunSync:148 - Running
sync for basePath s3://hidden-s3-bucket/hidden-prefix/ for following table
formats [DELTA]
2024-06-05 23:22:05 INFO
org.apache.hudi.common.table.HoodieTableMetaClient:133 - Loading
HoodieTableMetaClient from s3://hidden-s3-bucket/hidden-prefix
2024-06-05 23:22:05 WARN org.apache.hadoop.util.NativeCodeLoader:60 - Unable
to load native-hadoop library for your platform... using builtin-java classes
where applicable
2024-06-05 23:22:05 WARN org.apache.hadoop.metrics2.impl.MetricsConfig:136 -
Cannot locate configuration: tried
hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
2024-06-05 23:22:06 WARN org.apache.hadoop.fs.s3a.SDKV2Upgrade:39 - Directly
referencing AWS SDK V1 credential provider
com.amazonaws.auth.DefaultAWSCredentialsProviderChain. AWS SDK V1 credential
providers will be removed once S3A is upgraded to SDK V2
2024-06-05 23:22:07 INFO org.apache.hudi.common.table.HoodieTableConfig:276 -
Loading table properties from
s3://hidden-s3-bucket/hidden-prefix/.hoodie/hoodie.properties
2024-06-05 23:22:07 INFO
org.apache.hudi.common.table.HoodieTableMetaClient:152 - Finished Loading Table
of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from
s3://hidden-s3-bucket/hidden-prefix
2024-06-05 23:22:07 INFO
org.apache.hudi.common.table.HoodieTableMetaClient:155 - Loading Active commit
timeline for s3://hidden-s3-bucket/hidden-prefix
2024-06-05 23:22:07 INFO
org.apache.hudi.common.table.timeline.HoodieActiveTimeline:171 - Loaded
instants upto :
Option{val=[20240605231910580__clean__COMPLETED__20240605231918000]}
2024-06-05 23:22:07 INFO
org.apache.hudi.common.table.HoodieTableMetaClient:133 - Loading
HoodieTableMetaClient from s3://hidden-s3-bucket/hidden-prefix
2024-06-05 23:22:07 INFO org.apache.hudi.common.table.HoodieTableConfig:276 -
Loading table properties from
s3://hidden-s3-bucket/hidden-prefix/.hoodie/hoodie.properties
2024-06-05 23:22:07 INFO
org.apache.hudi.common.table.HoodieTableMetaClient:152 - Finished Loading Table
of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from
s3://hidden-s3-bucket/hidden-prefix
2024-06-05 23:22:07 INFO
org.apache.hudi.common.table.HoodieTableMetaClient:133 - Loading
HoodieTableMetaClient from s3://hidden-s3-bucket/hidden-prefix/.hoodie/metadata
2024-06-05 23:22:07 INFO org.apache.hudi.common.table.HoodieTableConfig:276 -
Loading table properties from
s3://hidden-s3-bucket/hidden-prefix/.hoodie/metadata/.hoodie/hoodie.properties
2024-06-05 23:22:07 INFO
org.apache.hudi.common.table.HoodieTableMetaClient:152 - Finished Loading Table
of type MERGE_ON_READ(version=1, baseFileFormat=HFILE) from
s3://hidden-s3-bucket/hidden-prefix/.hoodie/metadata
2024-06-05 23:22:08 INFO
org.apache.hudi.common.table.timeline.HoodieActiveTimeline:171 - Loaded
instants upto :
Option{val=[20240605231910580__deltacommit__COMPLETED__20240605231917000]}
2024-06-05 23:22:08 INFO
org.apache.hudi.common.table.view.AbstractTableFileSystemView:259 - Took 7 ms
to read 0 instants, 0 replaced file groups
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by
org.apache.hadoop.hbase.util.UnsafeAvailChecker
(file:/incubator-xtable/xtable-utilities/target/xtable-utilities-0.1.0-SNAPSHOT-bundled.jar)
to method java.nio.Bits.unaligned()
WARNING: Please consider reporting this to the maintainers of
org.apache.hadoop.hbase.util.UnsafeAvailChecker
WARNING: Use --illegal-access=warn to enable warnings of further illegal
reflective access operations
WARNING: All illegal access operations will be denied in a future release
2024-06-05 23:22:08 INFO org.apache.hudi.common.util.ClusteringUtils:147 -
Found 0 files in pending clustering operations
2024-06-05 23:22:08 INFO
org.apache.hudi.common.table.view.FileSystemViewManager:243 - Creating View
Manager with storage type :MEMORY
2024-06-05 23:22:08 INFO
org.apache.hudi.common.table.view.FileSystemViewManager:255 - Creating
in-memory based Table View
2024-06-05 23:22:11 INFO
org.apache.spark.sql.delta.storage.DelegatingLogStore:60 - LogStore
`LogStoreAdapter(io.delta.storage.S3SingleDriverLogStore)` is used for scheme
`s3`
2024-06-05 23:22:11 INFO org.apache.spark.sql.delta.DeltaLog:60 - Creating
initial snapshot without metadata, because the directory is empty
2024-06-05 23:22:13 INFO org.apache.spark.sql.delta.InitialSnapshot:60 -
[tableId=8eda3e8f-9dae-4d19-ac72-f625b8ccb0c5] Created snapshot
InitialSnapshot(path=s3://hidden-s3-bucket/hidden-prefix/_delta_log,
version=-1,
metadata=Metadata(167f7b26-f82d-4765-97b9-b6e47d9147ec,null,null,Format(parquet,Map()),null,List(),Map(),Some(1717629733296)),
logSegment=LogSegment(s3://hidden-s3-bucket/hidden-prefix/_delta_log,-1,List(),None,-1),
checksumOpt=None)
2024-06-05 23:22:13 INFO org.apache.xtable.conversion.ConversionController:240
- No previous InternalTable sync for target. Falling back to snapshot sync.
2024-06-05 23:22:13 INFO org.apache.hudi.common.table.TableSchemaResolver:317
- Reading schema from
s3://hidden-s3-bucket/hidden-prefix/op_date=2024-06-05/3b5d27af-ef39-4862-bbd9-d4a010f6056e-0_0-71-375_20240605231837826.parquet
2024-06-05 23:22:14 INFO org.apache.hudi.metadata.HoodieTableMetadataUtil:927
- Loading latest merged file slices for metadata table partition files
2024-06-05 23:22:14 INFO
org.apache.hudi.common.table.view.AbstractTableFileSystemView:259 - Took 1 ms
to read 0 instants, 0 replaced file groups
2024-06-05 23:22:14 INFO org.apache.hudi.common.util.ClusteringUtils:147 -
Found 0 files in pending clustering operations
2024-06-05 23:22:14 INFO
org.apache.hudi.common.table.view.AbstractTableFileSystemView:429 - Building
file system view for partition (files)
2024-06-05 23:22:14 DEBUG
org.apache.hudi.common.table.view.AbstractTableFileSystemView:435 - #files
found in partition (files) =30, Time taken =40
2024-06-05 23:22:14 DEBUG
org.apache.hudi.common.table.view.HoodieTableFileSystemView:386 - Adding
file-groups for partition :files, #FileGroups=1
2024-06-05 23:22:14 DEBUG
org.apache.hudi.common.table.view.AbstractTableFileSystemView:165 -
addFilesToView: NumFiles=30, NumFileGroups=1, FileGroupsCreationTime=15,
StoreTimeTaken=1
2024-06-05 23:22:14 DEBUG
org.apache.hudi.common.table.view.AbstractTableFileSystemView:449 - Time to
load partition (files) =57
2024-06-05 23:22:14 INFO
org.apache.hudi.metadata.HoodieBackedTableMetadata:451 - Opened metadata base
file from
s3://hidden-s3-bucket/hidden-prefix/.hoodie/metadata/files/files-0000-0_0-67-1304_20240605210834482001.hfile
at instant 20240605210834482001 in 9 ms
2024-06-05 23:22:14 INFO
org.apache.hudi.common.table.timeline.HoodieActiveTimeline:171 - Loaded
instants upto :
Option{val=[20240605231910580__clean__COMPLETED__20240605231918000]}
2024-06-05 23:22:14 ERROR org.apache.xtable.utilities.RunSync:171 - Error
running sync for s3://hidden-s3-bucket/hidden-prefix/
org.apache.hudi.exception.HoodieMetadataException: Failed to retrieve list of
partition from metadata
at
org.apache.hudi.metadata.BaseTableMetadata.getAllPartitionPaths(BaseTableMetadata.java:127)
~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
at
org.apache.xtable.hudi.HudiDataFileExtractor.getFilesCurrentState(HudiDataFileExtractor.java:116)
~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
at
org.apache.xtable.hudi.HudiConversionSource.getCurrentSnapshot(HudiConversionSource.java:97)
~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
at
org.apache.xtable.spi.extractor.ExtractFromSource.extractSnapshot(ExtractFromSource.java:38)
~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
at
org.apache.xtable.conversion.ConversionController.syncSnapshot(ConversionController.java:183)
~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
at
org.apache.xtable.conversion.ConversionController.sync(ConversionController.java:121)
~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
at org.apache.xtable.utilities.RunSync.main(RunSync.java:169)
[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
Caused by: java.lang.IllegalStateException: Recursive update
at
java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1739)
~[?:?]
at org.apache.avro.util.MapUtil.computeIfAbsent(MapUtil.java:42)
~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
at org.apache.avro.specific.SpecificData.getClass(SpecificData.java:257)
~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
at org.apache.avro.specific.SpecificData.newRecord(SpecificData.java:508)
~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
at
org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:237)
~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
at
org.apache.avro.specific.SpecificDatumReader.readRecord(SpecificDatumReader.java:123)
~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
at
org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:180)
~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
at
org.apache.avro.generic.GenericDatumReader.readMap(GenericDatumReader.java:355)
~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
at
org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:186)
~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
at
org.apache.avro.specific.SpecificDatumReader.readField(SpecificDatumReader.java:136)
~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
at
org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:248)
~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
at
org.apache.avro.specific.SpecificDatumReader.readRecord(SpecificDatumReader.java:123)
~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
at
org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:180)
~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
at
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:161)
~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
at
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:154)
~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
at org.apache.avro.file.DataFileStream.next(DataFileStream.java:263)
~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
at org.apache.avro.file.DataFileStream.next(DataFileStream.java:248)
~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
at
org.apache.hudi.common.table.timeline.TimelineMetadataUtils.deserializeAvroMetadata(TimelineMetadataUtils.java:209)
~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
at
org.apache.hudi.common.table.timeline.TimelineMetadataUtils.deserializeHoodieRollbackMetadata(TimelineMetadataUtils.java:177)
~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
at
org.apache.hudi.metadata.HoodieTableMetadataUtil.getRollbackedCommits(HoodieTableMetadataUtil.java:1355)
~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
at
org.apache.hudi.metadata.HoodieTableMetadataUtil.lambda$getValidInstantTimestamps$37(HoodieTableMetadataUtil.java:1284)
~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
~[?:?]
at
java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177) ~[?:?]
at
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1655)
~[?:?]
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
~[?:?]
at
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
~[?:?]
at
java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
~[?:?]
at
java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
~[?:?]
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
~[?:?]
at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497)
~[?:?]
at
org.apache.hudi.metadata.HoodieTableMetadataUtil.getValidInstantTimestamps(HoodieTableMetadataUtil.java:1283)
~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
at
org.apache.hudi.metadata.HoodieBackedTableMetadata.getLogRecordScanner(HoodieBackedTableMetadata.java:473)
~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
at
org.apache.hudi.metadata.HoodieBackedTableMetadata.openReaders(HoodieBackedTableMetadata.java:429)
~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
at
org.apache.hudi.metadata.HoodieBackedTableMetadata.lambda$getOrCreateReaders$10(HoodieBackedTableMetadata.java:412)
~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
at
java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1705)
~[?:?]
at
org.apache.hudi.metadata.HoodieBackedTableMetadata.getOrCreateReaders(HoodieBackedTableMetadata.java:412)
~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
at
org.apache.hudi.metadata.HoodieBackedTableMetadata.lookupKeysFromFileSlice(HoodieBackedTableMetadata.java:291)
~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
at
org.apache.hudi.metadata.HoodieBackedTableMetadata.getRecordsByKeys(HoodieBackedTableMetadata.java:255)
~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
at
org.apache.hudi.metadata.HoodieBackedTableMetadata.getRecordByKey(HoodieBackedTableMetadata.java:145)
~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
at
org.apache.hudi.metadata.BaseTableMetadata.fetchAllPartitionPaths(BaseTableMetadata.java:316)
~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
at
org.apache.hudi.metadata.BaseTableMetadata.getAllPartitionPaths(BaseTableMetadata.java:125)
~[xtable-utilities-0.1.0-SNAPSHOT-bundled.jar:0.1.0-SNAPSHOT]
... 6 more
config.yaml:
sourceFormat: HUDI
targetFormats:
- DELTA
datasets:
-
tableBasePath: s3://hidden-s3-bucket/hidden-prefix
tableName: hidden_table
partitionSpec: op_date:VALUE
hoodie.properties from the table:
hoodie.table.timeline.timezone=LOCAL
hoodie.table.keygenerator.class=org.apache.hudi.keygen.SimpleKeyGenerator
hoodie.table.precombine.field=ts_millis
hoodie.table.version=6
hoodie.database.name=
hoodie.datasource.write.hive_style_partitioning=true
hoodie.table.metadata.partitions.inflight=
hoodie.table.checksum=2622850774
hoodie.partition.metafile.use.base.format=false
hoodie.table.cdc.enabled=false
hoodie.archivelog.folder=archived
hoodie.table.name=hidden_table
hoodie.populate.meta.fields=true
hoodie.table.type=COPY_ON_WRITE
hoodie.datasource.write.partitionpath.urlencode=false
hoodie.table.base.file.format=PARQUET
hoodie.datasource.write.drop.partition.columns=false
hoodie.table.metadata.partitions=files
hoodie.timeline.layout.version=1
hoodie.table.recordkey.fields=record_id
hoodie.table.partition.fields=op_date
Thanks,
Lucas Fairchild-Madar
________________________________
The information contained in this e-mail message is intended only for the
personal and confidential use of the recipient(s) named above. This message may
be an attorney-client communication and/or work product and as such is
privileged and confidential. If the reader of this message is not the intended
recipient or an agent responsible for delivering it to the intended recipient,
you are hereby notified that you have received this document in error and that
any review, dissemination, distribution, or copying of this message is strictly
prohibited. If you have received this communication in error, please notify us
immediately by e-mail, and delete the original message.