xwmr-max commented on issue #1812:
URL:
https://github.com/apache/incubator-paimon/issues/1812#issuecomment-1697523075
I tried to write a demo of lookup LSM function: about 13 million dimensional
table data, single parallelism, no partition, a bucket. Specific logs are as
follows:
```
2023-08-29 21:44:14,837 INFO
org.apache.flink.paimon.lookup.FileLSMStoreLookupFunciton [] - Start look up !!!
2023-08-29 21:44:14,851 INFO
org.apache.shaded.org.apache.orc.impl.ReaderImpl [] - Reading ORC rows from
hdfs:/xxx/paimon2.db/lookup_dim4/data/bucket-0/data-95b6bdf0-97e1-4140-ba1f-53405b483f73-35.orc
with {include: [false, true, true, true, true, true], offset: 3, length:
164592368, sarg: leaf-(EQUALS id 13550405), schema:
struct<_KEY_id:string,_SEQUENCE_NUMBER:bigint,_VALUE_KIND:tinyint,id:string,name:string>,
includeAcidColumns: true, allowSARGToFilter: false, useSelected: false}
2023-08-29 21:44:14,931 INFO
org.apache.paimon.lookup.hash.HashLookupStoreWriter [] - Number of keys: 10000
2023-08-29 21:44:14,931 INFO
org.apache.paimon.lookup.hash.HashLookupStoreWriter [] - Number of values:
10000
2023-08-29 21:44:14,932 INFO
org.apache.paimon.lookup.hash.HashLookupStoreWriter [] - Built index file
index7.dat\n Max offset length: 1 bytes\n Slot size: 8 bytes
2023-08-29 21:44:14,932 INFO
org.apache.paimon.lookup.hash.HashLookupStoreWriter [] - Temporary index file
temp_index7.dat has been deleted
2023-08-29 21:44:14,932 INFO
org.apache.paimon.lookup.hash.HashLookupStoreWriter [] - Built index file
index8.dat\n Max offset length: 2 bytes\n Slot size: 10 bytes
2023-08-29 21:44:14,932 INFO
org.apache.paimon.lookup.hash.HashLookupStoreWriter [] - Temporary index file
temp_index8.dat has been deleted
2023-08-29 21:44:14,933 INFO
org.apache.paimon.lookup.hash.HashLookupStoreWriter [] - Built index file
index9.dat\n Max offset length: 2 bytes\n Slot size: 11 bytes
2023-08-29 21:44:14,933 INFO
org.apache.paimon.lookup.hash.HashLookupStoreWriter [] - Temporary index file
temp_index9.dat has been deleted
2023-08-29 21:44:14,933 INFO
org.apache.paimon.lookup.hash.HashLookupStoreWriter [] - Built index file
index10.dat\n Max offset length: 3 bytes\n Slot size: 13 bytes
2023-08-29 21:44:14,933 INFO
org.apache.paimon.lookup.hash.HashLookupStoreWriter [] - Temporary index file
temp_index10.dat has been deleted
2023-08-29 21:44:14,936 INFO
org.apache.paimon.lookup.hash.HashLookupStoreWriter [] - Built index file
index11.dat\n Max offset length: 3 bytes\n Slot size: 14 bytes
2023-08-29 21:44:14,936 INFO
org.apache.paimon.lookup.hash.HashLookupStoreWriter [] - Temporary index file
temp_index11.dat has been deleted
2023-08-29 21:44:14,936 INFO
org.apache.paimon.lookup.hash.HashLookupStoreWriter [] - Number of collisions:
3743
2023-08-29 21:44:14,936 INFO
org.apache.paimon.lookup.hash.HashLookupStoreWriter [] - Total expected store
size is 0.0 Mb
2023-08-29 21:44:14,937 INFO
org.apache.paimon.lookup.hash.HashLookupStoreWriter [] - Usable free space on
the system is 553,721.0 Mb
2023-08-29 21:44:14,937 INFO
org.apache.paimon.lookup.hash.HashLookupStoreWriter [] - Merging metadata.dat
size=172
2023-08-29 21:44:14,937 INFO
org.apache.paimon.lookup.hash.HashLookupStoreWriter [] - Merging index7.dat
size=24
2023-08-29 21:44:14,937 INFO
org.apache.paimon.lookup.hash.HashLookupStoreWriter [] - Merging index8.dat
size=170
2023-08-29 21:44:14,937 INFO
org.apache.paimon.lookup.hash.HashLookupStoreWriter [] - Merging index9.dat
size=1661
2023-08-29 21:44:14,937 INFO
org.apache.paimon.lookup.hash.HashLookupStoreWriter [] - Merging index10.dat
size=17472
2023-08-29 21:44:14,937 INFO
org.apache.paimon.lookup.hash.HashLookupStoreWriter [] - Merging index11.dat
size=165466
2023-08-29 21:44:14,937 INFO
org.apache.paimon.lookup.hash.HashLookupStoreWriter [] - Merging data7.dat
size=55
2023-08-29 21:44:14,937 INFO
org.apache.paimon.lookup.hash.HashLookupStoreWriter [] - Merging data8.dat
size=365
2023-08-29 21:44:14,937 INFO
org.apache.paimon.lookup.hash.HashLookupStoreWriter [] - Merging data9.dat
size=3278
2023-08-29 21:44:14,937 INFO
org.apache.paimon.lookup.hash.HashLookupStoreWriter [] - Merging data10.dat
size=30241
2023-08-29 21:44:14,938 INFO
org.apache.paimon.lookup.hash.HashLookupStoreWriter [] - Merging data11.dat
size=274785
2023-08-29 21:44:14,938 INFO
org.apache.paimon.lookup.hash.HashLookupStoreWriter [] - Time to merge
0.001346492 s
2023-08-29 21:44:14,938 INFO
org.apache.paimon.lookup.hash.HashLookupStoreWriter [] - Deleted temporary
file metadata.dat
2023-08-29 21:44:14,938 INFO
org.apache.paimon.lookup.hash.HashLookupStoreWriter [] - Deleted temporary
file index7.dat
2023-08-29 21:44:14,938 INFO
org.apache.paimon.lookup.hash.HashLookupStoreWriter [] - Deleted temporary
file index8.dat
2023-08-29 21:44:14,938 INFO
org.apache.paimon.lookup.hash.HashLookupStoreWriter [] - Deleted temporary
file index9.dat
2023-08-29 21:44:14,938 INFO
org.apache.paimon.lookup.hash.HashLookupStoreWriter [] - Deleted temporary
file index10.dat
2023-08-29 21:44:14,938 INFO
org.apache.paimon.lookup.hash.HashLookupStoreWriter [] - Deleted temporary
file index11.dat
2023-08-29 21:44:14,938 INFO
org.apache.paimon.lookup.hash.HashLookupStoreWriter [] - Deleted temporary
file data7.dat
2023-08-29 21:44:14,939 INFO
org.apache.paimon.lookup.hash.HashLookupStoreWriter [] - Deleted temporary
file data8.dat
2023-08-29 21:44:14,939 INFO
org.apache.paimon.lookup.hash.HashLookupStoreWriter [] - Deleted temporary
file data9.dat
2023-08-29 21:44:14,939 INFO
org.apache.paimon.lookup.hash.HashLookupStoreWriter [] - Deleted temporary
file data10.dat
2023-08-29 21:44:14,939 INFO
org.apache.paimon.lookup.hash.HashLookupStoreWriter [] - Deleted temporary
file data11.dat
2023-08-29 21:44:14,939 INFO
org.apache.paimon.lookup.hash.HashLookupStoreWriter [] - Deleted temporary
folder at
/srv/BigData/data1/nm/localdir/usercache/flink_test_R/appcache/application_[********]26_0243/paimon-io-937040c0-d332-4bd2-8e48-6abd2597afbf/11646b04-d76c-4d41-92c3-4df8c05481cf
2023-08-29 21:44:14,939 INFO
org.apache.paimon.lookup.hash.HashLookupStoreReader [] - Opening file
23161d9030be7f41fca5d53c80984aea.channel
2023-08-29 21:44:14,939 INFO
org.apache.paimon.lookup.hash.HashLookupStoreReader [] - Storage metadata\n
Created at: 2023.08.29 AD at 21:44:14 CST\n Key count: 10000\n Key count for
key length 7: 2\n Key count for key length 8: 13\n Key count for key length
9: 113\n Key count for key length 10: 1008\n Key count for key length 11:
8864\n Index size: 0.18 Mb\n Data size: 0.29 Mb\n
2023-08-29 21:44:14,940 INFO
org.apache.flink.paimon.lookup.FileLSMStoreLookupFunciton [] - look up use
time:103 ms
```
It takes 103ms, and some joins are relatively fast, with the fastest being
around 70ms.
The join rate is about 20 /s. Compared with flink rocksdb, the rate is about
400 /s without management memory for the same data volume. I find that the time
is mainly spent reading orc files here, which takes about 80 milliseconds. Are
there other ways to optimize besides increasing the cache?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]