BalaMahesh commented on issue #9758:
URL: https://github.com/apache/hudi/issues/9758#issuecomment-1734877004

   hey @codope , I had to add bunch of jars along with hbase-common to get the 
query running after enabling hudi.metadata_enabled=true. 
   
   What I have observed is that the query is taking longer time with this 
property set than the one without using metadata_enabled. 
   
   Also I have observed that hudi is creating bunch of log files in metadata 
folder even after I have set the configurations
   
   hoodie.metadata.enable=true
   hoodie.write.concurrency.mode=optimistic_concurrency_control
   
hoodie.write.lock.provider=org.apache.hudi.client.transaction.lock.InProcessLockProvider
   hoodie.metadata.compact.max.delta.commits=1
   
   and also it seems we don't have control over 
   
   hoodie.metadata.cleaner.commits.retained - to keep the number of log and 
base files on metadata folder. 
   
   With these many number of files, when we run the query, Trino is trying to 
run compaction on these log and base files in metadata folder and then reading 
the parquet files which is adding some delay and latency in query run times. 
   
   Is this behaviour expected ? We are assuming that queries will be faster 
with metadata enabled but what we are seeing is reverse. 
   
   
   query logs with hudi.metadata_enabled=false. 
   
   ```2023-09-26T11:02:12.915+0530      INFO    
Query-20230925_142458_00023_we5zs-956   
org.apache.hudi.common.table.HoodieTableMetaClient      Loading 
HoodieTableMetaClient from gs://bucket/hudi/db/table
   2023-09-26T11:02:13.047+0530 INFO    Query-20230925_142458_00023_we5zs-956   
org.apache.hudi.common.table.HoodieTableConfig  Loading table properties from 
gs://bucket/hudi/db/table/.hoodie/hoodie.properties
   2023-09-26T11:02:13.482+0530 INFO    Query-20230925_142458_00023_we5zs-956   
org.apache.hudi.common.table.HoodieTableMetaClient      Finished Loading Table 
of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from 
gs://bucket/hudi/db/table
   2023-09-26T11:02:13.788+0530 INFO    Query-20230925_142458_00023_we5zs-956   
org.apache.hudi.common.table.timeline.HoodieActiveTimeline      Loaded instants 
upto : Option{val=[20230925214447606__deltacommit__COMPLETED]}
   2023-09-26T11:02:13.788+0530 INFO    Query-20230925_142458_00023_we5zs-956   
org.apache.hudi.common.table.view.FileSystemViewManager Creating InMemory based 
view for basePath gs://bucket/hudi/db/table
   2023-09-26T11:02:13.789+0530 INFO    Query-20230925_142458_00023_we5zs-956   
org.apache.hudi.common.table.view.AbstractTableFileSystemView   Took 1 ms to 
read  0 instants, 0 replaced file groups
   2023-09-26T11:02:13.794+0530 INFO    Query-20230925_142458_00023_we5zs-956   
org.apache.hudi.common.util.ClusteringUtils     Found 0 files in pending 
clustering operations
   2023-09-26T11:02:13.796+0530 INFO    hudi-split-manager-22   
org.apache.hudi.common.table.view.AbstractTableFileSystemView   Building file 
system view for partition ()
   2023-09-26T11:02:13.938+0530 INFO    hudi-split-manager-22   
org.apache.hudi.common.table.view.AbstractTableFileSystemView   addFilesToView: 
NumFiles=12, NumFileGroups=1, FileGroupsCreationTime=2, StoreTimeTaken=0
   2023-09-26T11:02:15.720+0530 INFO    dispatcher-query-45     
io.trino.event.QueryMonitor     TIMELINE: Query 20230925_142458_00023_we5zs :: 
FINISHED :: elapsed 6814ms :: planning 1191ms :: waiting 2037ms :: scheduling 
3855ms :: running 1621ms :: finishing 147ms :: begin 
2023-09-26T11:02:08.906+05:30 :: end 2023-09-26T11:02:15.720+05:30```
   
   
   query logs with hudi.metadata_enabled=true.
   
   ```
   2023-09-26T11:04:12.748+0530 INFO    Query-20230925_142658_00025_we5zs-1005  
org.apache.hudi.common.table.HoodieTableMetaClient      Loading 
HoodieTableMetaClient from gs://bucket/hudi/db/table
   2023-09-26T11:04:12.927+0530 INFO    Query-20230925_142658_00025_we5zs-1005  
org.apache.hudi.common.table.HoodieTableConfig  Loading table properties from 
gs://bucket/hudi/db/table/.hoodie/hoodie.properties
   2023-09-26T11:04:13.379+0530 INFO    Query-20230925_142658_00025_we5zs-1005  
org.apache.hudi.common.table.HoodieTableMetaClient      Finished Loading Table 
of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from 
gs://bucket/hudi/db/table
   2023-09-26T11:04:13.626+0530 INFO    Query-20230925_142658_00025_we5zs-1005  
org.apache.hudi.common.table.timeline.HoodieActiveTimeline      Loaded instants 
upto : Option{val=[20230925214447606__deltacommit__COMPLETED]}
   2023-09-26T11:04:13.627+0530 INFO    Query-20230925_142658_00025_we5zs-1005  
org.apache.hudi.common.table.view.FileSystemViewManager Creating InMemory based 
view for basePath gs://bucket/hudi/db/table
   2023-09-26T11:04:13.627+0530 INFO    Query-20230925_142658_00025_we5zs-1005  
org.apache.hudi.common.table.view.AbstractTableFileSystemView   Took 0 ms to 
read  0 instants, 0 replaced file groups
   2023-09-26T11:04:13.628+0530 INFO    Query-20230925_142658_00025_we5zs-1005  
org.apache.hudi.common.util.ClusteringUtils     Found 0 files in pending 
clustering operations
   2023-09-26T11:04:13.629+0530 INFO    Query-20230925_142658_00025_we5zs-1005  
org.apache.hudi.common.table.HoodieTableMetaClient      Loading 
HoodieTableMetaClient from gs://bucket/hudi/db/table
   2023-09-26T11:04:13.814+0530 INFO    Query-20230925_142658_00025_we5zs-1005  
org.apache.hudi.common.table.HoodieTableConfig  Loading table properties from 
gs://bucket/hudi/db/table/.hoodie/hoodie.properties
   2023-09-26T11:04:14.273+0530 INFO    Query-20230925_142658_00025_we5zs-1005  
org.apache.hudi.common.table.HoodieTableMetaClient      Finished Loading Table 
of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from 
gs://bucket/hudi/db/table
   2023-09-26T11:04:14.273+0530 INFO    Query-20230925_142658_00025_we5zs-1005  
org.apache.hudi.common.table.HoodieTableMetaClient      Loading 
HoodieTableMetaClient from gs://bucket/hudi/db/table/.hoodie/metadata
   2023-09-26T11:04:14.499+0530 INFO    Query-20230925_142658_00025_we5zs-1005  
org.apache.hudi.common.table.HoodieTableConfig  Loading table properties from 
gs://bucket/hudi/db/table/.hoodie/metadata/.hoodie/hoodie.properties
   2023-09-26T11:04:14.969+0530 INFO    Query-20230925_142658_00025_we5zs-1005  
org.apache.hudi.common.table.HoodieTableMetaClient      Finished Loading Table 
of type MERGE_ON_READ(version=1, baseFileFormat=HFILE) from 
gs://bucket/hudi/db/table/.hoodie/metadata
   2023-09-26T11:04:14.974+0530 INFO    hudi-split-manager-25   
org.apache.hudi.common.table.view.AbstractTableFileSystemView   Building file 
system view for partition ()
   2023-09-26T11:04:14.976+0530 INFO    hudi-split-manager-25   
org.apache.hudi.metadata.HoodieTableMetadataUtil        Loading latest merged 
file slices for metadata table partition files
   2023-09-26T11:04:15.451+0530 INFO    hudi-split-manager-25   
org.apache.hudi.common.table.timeline.HoodieActiveTimeline      Loaded instants 
upto : Option{val=[20230925214447606__deltacommit__COMPLETED]}
   2023-09-26T11:04:15.451+0530 INFO    hudi-split-manager-25   
org.apache.hudi.common.table.view.AbstractTableFileSystemView   Took 0 ms to 
read  0 instants, 0 replaced file groups
   2023-09-26T11:04:15.599+0530 INFO    hudi-split-manager-25   
org.apache.hudi.common.util.ClusteringUtils     Found 0 files in pending 
clustering operations
   2023-09-26T11:04:15.600+0530 INFO    hudi-split-manager-25   
org.apache.hudi.common.table.view.AbstractTableFileSystemView   Building file 
system view for partition (files)
   2023-09-26T11:04:16.163+0530 INFO    hudi-split-manager-25   
org.apache.hudi.common.table.view.AbstractTableFileSystemView   addFilesToView: 
NumFiles=640, NumFileGroups=1, FileGroupsCreationTime=94, StoreTimeTaken=1
   2023-09-26T11:04:16.808+0530 INFO    hudi-split-manager-25   
org.apache.hudi.metadata.HoodieBackedTableMetadata      Opened metadata base 
file from 
gs://bucket/hudi/db/table/.hoodie/metadata/files/files-0000_0-14014-15896_20230925214227853001.hfile
 at instant 20230925214227853001 in 638 ms
   2023-09-26T11:04:17.016+0530 INFO    hudi-split-manager-25   
org.apache.hudi.common.table.timeline.HoodieActiveTimeline      Loaded instants 
upto : Option{val=[20230925214447606__deltacommit__COMPLETED]}
   2023-09-26T11:04:17.023+0530 INFO    hudi-split-manager-25   
org.apache.hudi.common.table.HoodieTableMetaClient      Loading 
HoodieTableMetaClient from gs://bucket/hudi/db/table/.hoodie/metadata
   2023-09-26T11:04:17.288+0530 INFO    hudi-split-manager-25   
org.apache.hudi.common.table.HoodieTableConfig  Loading table properties from 
gs://bucket/hudi/db/table/.hoodie/metadata/.hoodie/hoodie.properties
   2023-09-26T11:04:17.744+0530 INFO    hudi-split-manager-25   
org.apache.hudi.common.table.HoodieTableMetaClient      Finished Loading Table 
of type MERGE_ON_READ(version=1, baseFileFormat=HFILE) from 
gs://bucket/hudi/db/table/.hoodie/metadata
   2023-09-26T11:04:17.929+0530 INFO    hudi-split-manager-25   
org.apache.hudi.common.table.timeline.HoodieActiveTimeline      Loaded instants 
upto : Option{val=[20230925214447606__deltacommit__COMPLETED]}
   2023-09-26T11:04:18.399+0530 INFO    hudi-split-manager-25   
org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader  Scanning log 
file 
HoodieLogFile{pathStr='gs://bucket/hudi/db/table/.hoodie/metadata/files/.files-0000_20230925214227853001.log.1_0-14021-15903',
 fileLen=-1}
   ``` 
   
   There are 640 files lying on the metadata folder.
   
   <img width="1693" alt="Screenshot 2023-09-26 at 11 26 47 AM" 
src="https://github.com/apache/hudi/assets/25053668/ee4cc04d-3892-40b2-97d1-c00f7612ef22";>
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to