Hey guys,
I'm trying to use the Hudi CLI to connect to tables stored on S3 using the
Glue metastore. Using a tip from Ashish M G
<https://apache-hudi.slack.com/archives/C4D716NPQ/p1599243415197500?thread_ts=1599242852.196900&cid=C4D716NPQ>
on Slack, I added the dependencies, re-built and was able to use the
connect command to connect to the table, albeit with warnings:
hudi->connect --path s3a://bucketName/path.parquet
29597 [Spring Shell] INFO
org.apache.hudi.common.table.HoodieTableMetaClient - Loading
HoodieTableMetaClient from s3a://bucketName/path.parquet
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by
org.apache.hadoop.security.authentication.util.KerberosUtil
(file:/home/username/hudi-cli/target/lib/hadoop-auth-2.7.3.jar) to method
sun.security.krb5.Config.getInstance()
WARNING: Please consider reporting this to the maintainers of
org.apache.hadoop.security.authentication.util.KerberosUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal
reflective access operations
WARNING: All illegal access operations will be denied in a future release
29785 [Spring Shell] WARN org.apache.hadoop.util.NativeCodeLoader -
Unable to load native-hadoop library for your platform... using
builtin-java classes where applicable
31060 [Spring Shell] INFO org.apache.hudi.common.fs.FSUtils - Hadoop
Configuration: fs.defaultFS: [file:///], Config:[Configuration:
core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml,
yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml],
FileSystem: [org.apache.hadoop.fs.s3a.S3AFileSystem@6b725a01]
31380 [Spring Shell] INFO org.apache.hudi.common.table.HoodieTableConfig -
Loading table properties from
s3a://bucketName/path.parquet/.hoodie/hoodie.properties
31455 [Spring Shell] INFO
org.apache.hudi.common.table.HoodieTableMetaClient - Finished Loading
Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from
s3a://bucketName/path.parquet
Metadata for table tablename loaded
However, many of the other commands seem to not be working properly:
hudi:tablename->savepoints show
╔═══════════════╗
║ SavepointTime ║
╠═══════════════╣
║ (empty) ║
╚═══════════════╝
hudi:tablename->savepoint create
Commit null not found in Commits
org.apache.hudi.common.table.timeline.HoodieDefaultTimeline:
[20200724220817__commit__COMPLETED]
hudi:tablename->stats filesizes
╔════════════╤═══════╤═══════╤═══════╤═══════╤═══════╤═══════╤══════════╤════════╗
║ CommitTime │ Min │ 10th │ 50th │ avg │ 95th │ Max │ NumFiles │
StdDev ║
╠════════════╪═══════╪═══════╪═══════╪═══════╪═══════╪═══════╪══════════╪════════╣
║ ALL │ 0.0 B │ 0.0 B │ 0.0 B │ 0.0 B │ 0.0 B │ 0.0 B │ 0 │
0.0 B ║
╚════════════╧═══════╧═══════╧═══════╧═══════╧═══════╧═══════╧══════════╧════════╝
hudi:tablename->show fsview all
171314 [Spring Shell] INFO
org.apache.hudi.common.table.HoodieTableMetaClient - Loading
HoodieTableMetaClient from s3a://bucketName/path.parquet
171362 [Spring Shell] INFO org.apache.hudi.common.fs.FSUtils - Hadoop
Configuration: fs.defaultFS: [file:///], Config:[Configuration:
core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml,
yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml],
FileSystem: [org.apache.hadoop.fs.s3a.S3AFileSystem@6b725a01]
171666 [Spring Shell] INFO org.apache.hudi.common.table.HoodieTableConfig -
Loading table properties from
s3a://bucketName/path.parquet/.hoodie/hoodie.properties
171725 [Spring Shell] INFO
org.apache.hudi.common.table.HoodieTableMetaClient - Finished Loading
Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from
s3a://bucketName/path.parquet
171725 [Spring Shell] INFO
org.apache.hudi.common.table.HoodieTableMetaClient - Loading Active commit
timeline for s3a://bucketName/path.parquet
171817 [Spring Shell] INFO
org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded
instants [[20200724220817__clean__COMPLETED],
[20200724220817__commit__COMPLETED]]
172262 [Spring Shell] INFO
org.apache.hudi.common.table.view.AbstractTableFileSystemView -
addFilesToView: NumFiles=0, NumFileGroups=0, FileGroupsCreationTime=5,
StoreTimeTaken=2
╔═══════════╤════════╤══════════════╤═══════════╤════════════════╤═════════════════╤═══════════════════════╤═════════════╗
║ Partition │ FileId │ Base-Instant │ Data-File │ Data-File Size │ Num
Delta Files │ Total Delta File Size │ Delta Files ║
╠═══════════╧════════╧══════════════╧═══════════╧════════════════╧═════════════════╧═══════════════════════╧═════════════╣
║ (empty)
║
╚════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╝
I looked through the CLI code, and it seems that for true support we would
need to add support for the different storage options hdfs/s3/azure/etc. in
HoodieTableMetaClient. As from my understanding TableNotFoundException.
checkTableValidity one of the first steps in this function checks just the
hdfs filesystem.
Could someone please clarify if this is something already supported and I'm
just not configuring it correctly or if it's something that would need to
be added and if the HoodieTableMetaClient is on the right track or not?
Thanks,