>From Ritik Raj <[email protected]>:

Ritik Raj has posted comments on this change by Ritik Raj. ( 
https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20863?usp=email )

Change subject: [ASTERIXDB-3702][RT][STO] LSM Sampling
......................................................................


Patch Set 3:

(1 comment)

Patchset:

PS3:
Unit test proving the error rate is within bounds for cardinality estimation 
using Theta Sketches.

====================================================================================================
TEST: Full Matrix - N Indexes x Upsert% x Delete%
====================================================================================================
Configuration: 5,000 keys per index, 4 flushes per index

N        Upsert %   Delete %   Expected        Estimated       Error %
--------------------------------------------------------------------------------
1        0          0          3125            3148            0.74
1        0          10         2635            2652            0.65
1        0          25         2089            2112            1.10
1        25         0          3125            3148            0.74
1        25         10         2723            2737            0.51
1        25         25         2265            2290            1.10
1        50         0          3125            3132            0.22
1        50         10         2804            2811            0.25
1        50         25         2428            2423            0.21
4        0          0          12500           12454           0.37
4        0          10         10521           10485           0.34      
4        0          25         8332            8369            0.44   
4        25         0          12500           12454           0.37
4        25         10         10960           10914           0.42
4        25         25         9133            9119            0.15
4        50         0          12500           12411           0.71
4        50         10         11226           11181           0.40
4        50         25         9785            9747            0.39
16       0          0          50000           49970           0.06
16       0          10         42063           42034           0.07
16       0          25         33289           33222           0.20
16       25         0          50000           49970           0.06
16       25         10         43748           43717           0.07
16       25         25         36496           36428           0.19
16       50         0          50000           49791           0.42
16       50         10         45091           44885           0.46
16       50         25         39260           39143           0.30
64       0          0          200000          199783          0.11
64       0          10         168224          167964          0.15
64       0          25         133278          133008          0.20
64       25         0          200000          199783          0.11
64       25         10         174800          174606          0.11
64       25         25         146073          145858          0.15
64       50         0          200000          199664          0.17
64       50         10         180104          179949          0.09       
64       50         25         156918          156548          0.24    

================================================================================
TEST: Error Rate vs Combined Upsert + Delete Percentage
================================================================================
Configuration: 16 indexes, 15,000 keys per index, 6 flushes

Upsert %   Delete %   Expected        Estimated       Error %
----------------------------------------------------------------------
0          0          106640          106491          0.14
10         5          92964           92903           0.07
20         10         83651           83561           0.11
30         15         77456           77446           0.01
40         20         73303           73168           0.18
50         25         70459           70482           0.03
60         30         68686           68679           0.01
70         35         67817           67292           0.77
80         40         67359           67391           0.05

================================================================================
TEST: Error Rate vs Number of LSM Indexes (N)
================================================================================
Configuration: 10,000 keys per index, 3 flushes per index, 0% upsert, 0% delete

N          Expected        Estimated       Error %         Components
----------------------------------------------------------------------
1          9999            9897            1.02            3
2          19998           19934           0.32            6
4          39996           40058           0.16            12
8          79992           79624           0.46            24
16         159984          159792          0.12            48
32         319968          319348          0.19            96
64         639936          640281          0.05            192
128        1279872         1279870         0.00            384

================================================================================
TEST: Error Rate vs Upsert Percentage
================================================================================
Configuration: 8 indexes, 20,000 keys per index, 5 flushes, 0% delete

Upsert %     Expected        Estimated       Error %
------------------------------------------------------------
0            96000           95855           0.15
10           96000           95855           0.15    
20           96000           95855           0.15
30           96000           95842           0.16
40           96000           96088           0.09
50           96000           96076           0.08
60           96000           95996           0.00
70           96000           96081           0.08 
80           96000           95750           0.26
90           96000           95906           0.10

================================================================================
TEST: Accuracy Bounds Verification
================================================================================
Verifying error rates stay within expected bounds for various configurations

Simple 3-flush, no overlap: Expected=30000, Estimated=30368, Error=1.23%
30% upsert scenario: Expected=20000, Estimated=20054, Error=0.27%
With deletes scenario: Expected=21188, Estimated=22043, Error=4.04%

All accuracy bounds verified successfully!
================================================================================
TEST: Large Scale LSM Simulation
================================================================================
Configuration: 32 indexes, 50,000 keys each, 10 flushes, mixed workload

Results:
----------------------------------------
Total Indexes:      32
Total Components:   320
Expected Keys:      780,929
Estimated Keys:     780,120
Error Rate:         0.10%
Time Elapsed:       393 ms

================================================================================
TEST: Error Rate vs Delete Percentage
================================================================================
Configuration: 8 indexes, 20,000 keys per index, 5 flushes, 0% upsert

Delete %     Expected        Estimated       Error %
------------------------------------------------------------
0            96000           95855           0.15
5            85671           85641           0.04
10           76821           76819           0.00 
15           69337           69302           0.05
20           62938           62785           0.24
25           57458           57258           0.35
30           52704           52698           0.01
40           45195           44745           1.00
50           39483           39463           0.05

================================================================================
TEST: Error Rate vs Number of Components per Index
================================================================================
Configuration: 8 indexes, 30,000 total keys per index, varying flush count

Flushes      Keys/Flush      Expected        Estimated       Error %
---------------------------------------------------------------------------
1            30000           240000          243312          1.38
2            15000           234716          236380          0.71
3            10000           229582          229802          0.10
5            6000            219797          218656          0.52
10           3000            197433          196504          0.47
15           2000            178376          177878          0.28
20           1500            161850          161293          0.34
30           1000            134360          134057          0.23


Process finished with exit code 0



--
To view, visit https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/20863?usp=email
To unsubscribe, or for help writing mail filters, visit 
https://asterix-gerrit.ics.uci.edu/settings?usp=email

Gerrit-MessageType: comment
Gerrit-Project: asterixdb
Gerrit-Branch: master
Gerrit-Change-Id: Ieaeb919c3b058955860385012b4d1bb738fc1cfa
Gerrit-Change-Number: 20863
Gerrit-PatchSet: 3
Gerrit-Owner: Ritik Raj <[email protected]>
Gerrit-Reviewer: Anon. E. Moose #1000171
Gerrit-Reviewer: Jenkins <[email protected]>
Gerrit-Comment-Date: Sat, 07 Feb 2026 14:28:15 +0000
Gerrit-HasComments: Yes
Gerrit-Has-Labels: No

Reply via email to