xwmr-max commented on issue #1812:
URL: 
https://github.com/apache/incubator-paimon/issues/1812#issuecomment-1697523075

   I tried to write a demo of lookup LSM function: about 13 million dimensional 
table data, single parallelism, no partition, a bucket. Specific logs are as 
follows:
   
   ```
   2023-08-29 21:44:14,837 INFO  
org.apache.flink.paimon.lookup.FileLSMStoreLookupFunciton [] - Start look up !!!
   2023-08-29 21:44:14,851 INFO  
org.apache.shaded.org.apache.orc.impl.ReaderImpl     [] - Reading ORC rows from 
hdfs:/xxx/paimon2.db/lookup_dim4/data/bucket-0/data-95b6bdf0-97e1-4140-ba1f-53405b483f73-35.orc
 with {include: [false, true, true, true, true, true], offset: 3, length: 
164592368, sarg: leaf-(EQUALS id 13550405), schema: 
struct<_KEY_id:string,_SEQUENCE_NUMBER:bigint,_VALUE_KIND:tinyint,id:string,name:string>,
 includeAcidColumns: true, allowSARGToFilter: false, useSelected: false}
   2023-08-29 21:44:14,931 INFO  
org.apache.paimon.lookup.hash.HashLookupStoreWriter  [] - Number of keys: 10000
   2023-08-29 21:44:14,931 INFO  
org.apache.paimon.lookup.hash.HashLookupStoreWriter  [] - Number of values: 
10000
   2023-08-29 21:44:14,932 INFO  
org.apache.paimon.lookup.hash.HashLookupStoreWriter  [] - Built index file 
index7.dat\n  Max offset length: 1 bytes\n  Slot size: 8 bytes
   2023-08-29 21:44:14,932 INFO  
org.apache.paimon.lookup.hash.HashLookupStoreWriter  [] - Temporary index file 
temp_index7.dat has been deleted
   2023-08-29 21:44:14,932 INFO  
org.apache.paimon.lookup.hash.HashLookupStoreWriter  [] - Built index file 
index8.dat\n  Max offset length: 2 bytes\n  Slot size: 10 bytes
   2023-08-29 21:44:14,932 INFO  
org.apache.paimon.lookup.hash.HashLookupStoreWriter  [] - Temporary index file 
temp_index8.dat has been deleted
   2023-08-29 21:44:14,933 INFO  
org.apache.paimon.lookup.hash.HashLookupStoreWriter  [] - Built index file 
index9.dat\n  Max offset length: 2 bytes\n  Slot size: 11 bytes
   2023-08-29 21:44:14,933 INFO  
org.apache.paimon.lookup.hash.HashLookupStoreWriter  [] - Temporary index file 
temp_index9.dat has been deleted
   2023-08-29 21:44:14,933 INFO  
org.apache.paimon.lookup.hash.HashLookupStoreWriter  [] - Built index file 
index10.dat\n  Max offset length: 3 bytes\n  Slot size: 13 bytes
   2023-08-29 21:44:14,933 INFO  
org.apache.paimon.lookup.hash.HashLookupStoreWriter  [] - Temporary index file 
temp_index10.dat has been deleted
   2023-08-29 21:44:14,936 INFO  
org.apache.paimon.lookup.hash.HashLookupStoreWriter  [] - Built index file 
index11.dat\n  Max offset length: 3 bytes\n  Slot size: 14 bytes
   2023-08-29 21:44:14,936 INFO  
org.apache.paimon.lookup.hash.HashLookupStoreWriter  [] - Temporary index file 
temp_index11.dat has been deleted
   2023-08-29 21:44:14,936 INFO  
org.apache.paimon.lookup.hash.HashLookupStoreWriter  [] - Number of collisions: 
3743
   2023-08-29 21:44:14,936 INFO  
org.apache.paimon.lookup.hash.HashLookupStoreWriter  [] - Total expected store 
size is 0.0 Mb
   2023-08-29 21:44:14,937 INFO  
org.apache.paimon.lookup.hash.HashLookupStoreWriter  [] - Usable free space on 
the system is 553,721.0 Mb
   2023-08-29 21:44:14,937 INFO  
org.apache.paimon.lookup.hash.HashLookupStoreWriter  [] - Merging metadata.dat 
size=172
   2023-08-29 21:44:14,937 INFO  
org.apache.paimon.lookup.hash.HashLookupStoreWriter  [] - Merging index7.dat 
size=24
   2023-08-29 21:44:14,937 INFO  
org.apache.paimon.lookup.hash.HashLookupStoreWriter  [] - Merging index8.dat 
size=170
   2023-08-29 21:44:14,937 INFO  
org.apache.paimon.lookup.hash.HashLookupStoreWriter  [] - Merging index9.dat 
size=1661
   2023-08-29 21:44:14,937 INFO  
org.apache.paimon.lookup.hash.HashLookupStoreWriter  [] - Merging index10.dat 
size=17472
   2023-08-29 21:44:14,937 INFO  
org.apache.paimon.lookup.hash.HashLookupStoreWriter  [] - Merging index11.dat 
size=165466
   2023-08-29 21:44:14,937 INFO  
org.apache.paimon.lookup.hash.HashLookupStoreWriter  [] - Merging data7.dat 
size=55
   2023-08-29 21:44:14,937 INFO  
org.apache.paimon.lookup.hash.HashLookupStoreWriter  [] - Merging data8.dat 
size=365
   2023-08-29 21:44:14,937 INFO  
org.apache.paimon.lookup.hash.HashLookupStoreWriter  [] - Merging data9.dat 
size=3278
   2023-08-29 21:44:14,937 INFO  
org.apache.paimon.lookup.hash.HashLookupStoreWriter  [] - Merging data10.dat 
size=30241
   2023-08-29 21:44:14,938 INFO  
org.apache.paimon.lookup.hash.HashLookupStoreWriter  [] - Merging data11.dat 
size=274785
   2023-08-29 21:44:14,938 INFO  
org.apache.paimon.lookup.hash.HashLookupStoreWriter  [] - Time to merge 
0.001346492 s
   2023-08-29 21:44:14,938 INFO  
org.apache.paimon.lookup.hash.HashLookupStoreWriter  [] - Deleted temporary 
file metadata.dat
   2023-08-29 21:44:14,938 INFO  
org.apache.paimon.lookup.hash.HashLookupStoreWriter  [] - Deleted temporary 
file index7.dat
   2023-08-29 21:44:14,938 INFO  
org.apache.paimon.lookup.hash.HashLookupStoreWriter  [] - Deleted temporary 
file index8.dat
   2023-08-29 21:44:14,938 INFO  
org.apache.paimon.lookup.hash.HashLookupStoreWriter  [] - Deleted temporary 
file index9.dat
   2023-08-29 21:44:14,938 INFO  
org.apache.paimon.lookup.hash.HashLookupStoreWriter  [] - Deleted temporary 
file index10.dat
   2023-08-29 21:44:14,938 INFO  
org.apache.paimon.lookup.hash.HashLookupStoreWriter  [] - Deleted temporary 
file index11.dat
   2023-08-29 21:44:14,938 INFO  
org.apache.paimon.lookup.hash.HashLookupStoreWriter  [] - Deleted temporary 
file data7.dat
   2023-08-29 21:44:14,939 INFO  
org.apache.paimon.lookup.hash.HashLookupStoreWriter  [] - Deleted temporary 
file data8.dat
   2023-08-29 21:44:14,939 INFO  
org.apache.paimon.lookup.hash.HashLookupStoreWriter  [] - Deleted temporary 
file data9.dat
   2023-08-29 21:44:14,939 INFO  
org.apache.paimon.lookup.hash.HashLookupStoreWriter  [] - Deleted temporary 
file data10.dat
   2023-08-29 21:44:14,939 INFO  
org.apache.paimon.lookup.hash.HashLookupStoreWriter  [] - Deleted temporary 
file data11.dat
   2023-08-29 21:44:14,939 INFO  
org.apache.paimon.lookup.hash.HashLookupStoreWriter  [] - Deleted temporary 
folder at 
/srv/BigData/data1/nm/localdir/usercache/flink_test_R/appcache/application_[********]26_0243/paimon-io-937040c0-d332-4bd2-8e48-6abd2597afbf/11646b04-d76c-4d41-92c3-4df8c05481cf
   2023-08-29 21:44:14,939 INFO  
org.apache.paimon.lookup.hash.HashLookupStoreReader  [] - Opening file 
23161d9030be7f41fca5d53c80984aea.channel
   2023-08-29 21:44:14,939 INFO  
org.apache.paimon.lookup.hash.HashLookupStoreReader  [] - Storage metadata\n  
Created at: 2023.08.29 AD at 21:44:14 CST\n  Key count: 10000\n  Key count for 
key length 7: 2\n  Key count for key length 8: 13\n  Key count for key length 
9: 113\n  Key count for key length 10: 1008\n  Key count for key length 11: 
8864\n  Index size: 0.18 Mb\n  Data size: 0.29 Mb\n
   2023-08-29 21:44:14,940 INFO  
org.apache.flink.paimon.lookup.FileLSMStoreLookupFunciton [] - look up use 
time:103 ms
   ```
   It takes 103ms, and some joins are relatively fast, with the fastest being 
around 70ms.
   
   The join rate is about 20 /s. Compared with flink rocksdb, the rate is about 
400 /s without management memory for the same data volume. I find that the time 
is mainly spent reading orc files here, which takes about 80 milliseconds. Are 
there other ways to optimize besides increasing the cache?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to