BalaMahesh commented on issue #9758:
URL: https://github.com/apache/hudi/issues/9758#issuecomment-1734877004
hey @codope , I had to add bunch of jars along with hbase-common to get the
query running after enabling hudi.metadata_enabled=true.
What I have observed is that the query is taking longer time with this
property set than the one without using metadata_enabled.
Also I have observed that hudi is creating bunch of log files in metadata
folder even after I have set the configurations
hoodie.metadata.enable=true
hoodie.write.concurrency.mode=optimistic_concurrency_control
hoodie.write.lock.provider=org.apache.hudi.client.transaction.lock.InProcessLockProvider
hoodie.metadata.compact.max.delta.commits=1
and also it seems we don't have control over
hoodie.metadata.cleaner.commits.retained - to keep the number of log and
base files on metadata folder.
With these many number of files, when we run the query, Trino is trying to
run compaction on these log and base files in metadata folder and then reading
the parquet files which is adding some delay and latency in query run times.
Is this behaviour expected ? We are assuming that queries will be faster
with metadata enabled but what we are seeing is reverse.
query logs with hudi.metadata_enabled=false.
```2023-09-26T11:02:12.915+0530 INFO
Query-20230925_142458_00023_we5zs-956
org.apache.hudi.common.table.HoodieTableMetaClient Loading
HoodieTableMetaClient from gs://bucket/hudi/db/table
2023-09-26T11:02:13.047+0530 INFO Query-20230925_142458_00023_we5zs-956
org.apache.hudi.common.table.HoodieTableConfig Loading table properties from
gs://bucket/hudi/db/table/.hoodie/hoodie.properties
2023-09-26T11:02:13.482+0530 INFO Query-20230925_142458_00023_we5zs-956
org.apache.hudi.common.table.HoodieTableMetaClient Finished Loading Table
of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from
gs://bucket/hudi/db/table
2023-09-26T11:02:13.788+0530 INFO Query-20230925_142458_00023_we5zs-956
org.apache.hudi.common.table.timeline.HoodieActiveTimeline Loaded instants
upto : Option{val=[20230925214447606__deltacommit__COMPLETED]}
2023-09-26T11:02:13.788+0530 INFO Query-20230925_142458_00023_we5zs-956
org.apache.hudi.common.table.view.FileSystemViewManager Creating InMemory based
view for basePath gs://bucket/hudi/db/table
2023-09-26T11:02:13.789+0530 INFO Query-20230925_142458_00023_we5zs-956
org.apache.hudi.common.table.view.AbstractTableFileSystemView Took 1 ms to
read 0 instants, 0 replaced file groups
2023-09-26T11:02:13.794+0530 INFO Query-20230925_142458_00023_we5zs-956
org.apache.hudi.common.util.ClusteringUtils Found 0 files in pending
clustering operations
2023-09-26T11:02:13.796+0530 INFO hudi-split-manager-22
org.apache.hudi.common.table.view.AbstractTableFileSystemView Building file
system view for partition ()
2023-09-26T11:02:13.938+0530 INFO hudi-split-manager-22
org.apache.hudi.common.table.view.AbstractTableFileSystemView addFilesToView:
NumFiles=12, NumFileGroups=1, FileGroupsCreationTime=2, StoreTimeTaken=0
2023-09-26T11:02:15.720+0530 INFO dispatcher-query-45
io.trino.event.QueryMonitor TIMELINE: Query 20230925_142458_00023_we5zs ::
FINISHED :: elapsed 6814ms :: planning 1191ms :: waiting 2037ms :: scheduling
3855ms :: running 1621ms :: finishing 147ms :: begin
2023-09-26T11:02:08.906+05:30 :: end 2023-09-26T11:02:15.720+05:30```
query logs with hudi.metadata_enabled=true.
```
2023-09-26T11:04:12.748+0530 INFO Query-20230925_142658_00025_we5zs-1005
org.apache.hudi.common.table.HoodieTableMetaClient Loading
HoodieTableMetaClient from gs://bucket/hudi/db/table
2023-09-26T11:04:12.927+0530 INFO Query-20230925_142658_00025_we5zs-1005
org.apache.hudi.common.table.HoodieTableConfig Loading table properties from
gs://bucket/hudi/db/table/.hoodie/hoodie.properties
2023-09-26T11:04:13.379+0530 INFO Query-20230925_142658_00025_we5zs-1005
org.apache.hudi.common.table.HoodieTableMetaClient Finished Loading Table
of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from
gs://bucket/hudi/db/table
2023-09-26T11:04:13.626+0530 INFO Query-20230925_142658_00025_we5zs-1005
org.apache.hudi.common.table.timeline.HoodieActiveTimeline Loaded instants
upto : Option{val=[20230925214447606__deltacommit__COMPLETED]}
2023-09-26T11:04:13.627+0530 INFO Query-20230925_142658_00025_we5zs-1005
org.apache.hudi.common.table.view.FileSystemViewManager Creating InMemory based
view for basePath gs://bucket/hudi/db/table
2023-09-26T11:04:13.627+0530 INFO Query-20230925_142658_00025_we5zs-1005
org.apache.hudi.common.table.view.AbstractTableFileSystemView Took 0 ms to
read 0 instants, 0 replaced file groups
2023-09-26T11:04:13.628+0530 INFO Query-20230925_142658_00025_we5zs-1005
org.apache.hudi.common.util.ClusteringUtils Found 0 files in pending
clustering operations
2023-09-26T11:04:13.629+0530 INFO Query-20230925_142658_00025_we5zs-1005
org.apache.hudi.common.table.HoodieTableMetaClient Loading
HoodieTableMetaClient from gs://bucket/hudi/db/table
2023-09-26T11:04:13.814+0530 INFO Query-20230925_142658_00025_we5zs-1005
org.apache.hudi.common.table.HoodieTableConfig Loading table properties from
gs://bucket/hudi/db/table/.hoodie/hoodie.properties
2023-09-26T11:04:14.273+0530 INFO Query-20230925_142658_00025_we5zs-1005
org.apache.hudi.common.table.HoodieTableMetaClient Finished Loading Table
of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from
gs://bucket/hudi/db/table
2023-09-26T11:04:14.273+0530 INFO Query-20230925_142658_00025_we5zs-1005
org.apache.hudi.common.table.HoodieTableMetaClient Loading
HoodieTableMetaClient from gs://bucket/hudi/db/table/.hoodie/metadata
2023-09-26T11:04:14.499+0530 INFO Query-20230925_142658_00025_we5zs-1005
org.apache.hudi.common.table.HoodieTableConfig Loading table properties from
gs://bucket/hudi/db/table/.hoodie/metadata/.hoodie/hoodie.properties
2023-09-26T11:04:14.969+0530 INFO Query-20230925_142658_00025_we5zs-1005
org.apache.hudi.common.table.HoodieTableMetaClient Finished Loading Table
of type MERGE_ON_READ(version=1, baseFileFormat=HFILE) from
gs://bucket/hudi/db/table/.hoodie/metadata
2023-09-26T11:04:14.974+0530 INFO hudi-split-manager-25
org.apache.hudi.common.table.view.AbstractTableFileSystemView Building file
system view for partition ()
2023-09-26T11:04:14.976+0530 INFO hudi-split-manager-25
org.apache.hudi.metadata.HoodieTableMetadataUtil Loading latest merged
file slices for metadata table partition files
2023-09-26T11:04:15.451+0530 INFO hudi-split-manager-25
org.apache.hudi.common.table.timeline.HoodieActiveTimeline Loaded instants
upto : Option{val=[20230925214447606__deltacommit__COMPLETED]}
2023-09-26T11:04:15.451+0530 INFO hudi-split-manager-25
org.apache.hudi.common.table.view.AbstractTableFileSystemView Took 0 ms to
read 0 instants, 0 replaced file groups
2023-09-26T11:04:15.599+0530 INFO hudi-split-manager-25
org.apache.hudi.common.util.ClusteringUtils Found 0 files in pending
clustering operations
2023-09-26T11:04:15.600+0530 INFO hudi-split-manager-25
org.apache.hudi.common.table.view.AbstractTableFileSystemView Building file
system view for partition (files)
2023-09-26T11:04:16.163+0530 INFO hudi-split-manager-25
org.apache.hudi.common.table.view.AbstractTableFileSystemView addFilesToView:
NumFiles=640, NumFileGroups=1, FileGroupsCreationTime=94, StoreTimeTaken=1
2023-09-26T11:04:16.808+0530 INFO hudi-split-manager-25
org.apache.hudi.metadata.HoodieBackedTableMetadata Opened metadata base
file from
gs://bucket/hudi/db/table/.hoodie/metadata/files/files-0000_0-14014-15896_20230925214227853001.hfile
at instant 20230925214227853001 in 638 ms
2023-09-26T11:04:17.016+0530 INFO hudi-split-manager-25
org.apache.hudi.common.table.timeline.HoodieActiveTimeline Loaded instants
upto : Option{val=[20230925214447606__deltacommit__COMPLETED]}
2023-09-26T11:04:17.023+0530 INFO hudi-split-manager-25
org.apache.hudi.common.table.HoodieTableMetaClient Loading
HoodieTableMetaClient from gs://bucket/hudi/db/table/.hoodie/metadata
2023-09-26T11:04:17.288+0530 INFO hudi-split-manager-25
org.apache.hudi.common.table.HoodieTableConfig Loading table properties from
gs://bucket/hudi/db/table/.hoodie/metadata/.hoodie/hoodie.properties
2023-09-26T11:04:17.744+0530 INFO hudi-split-manager-25
org.apache.hudi.common.table.HoodieTableMetaClient Finished Loading Table
of type MERGE_ON_READ(version=1, baseFileFormat=HFILE) from
gs://bucket/hudi/db/table/.hoodie/metadata
2023-09-26T11:04:17.929+0530 INFO hudi-split-manager-25
org.apache.hudi.common.table.timeline.HoodieActiveTimeline Loaded instants
upto : Option{val=[20230925214447606__deltacommit__COMPLETED]}
2023-09-26T11:04:18.399+0530 INFO hudi-split-manager-25
org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader Scanning log
file
HoodieLogFile{pathStr='gs://bucket/hudi/db/table/.hoodie/metadata/files/.files-0000_20230925214227853001.log.1_0-14021-15903',
fileLen=-1}
```
There are 640 files lying on the metadata folder.
<img width="1693" alt="Screenshot 2023-09-26 at 11 26 47 AM"
src="https://github.com/apache/hudi/assets/25053668/ee4cc04d-3892-40b2-97d1-c00f7612ef22">
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]