[ 
https://issues.apache.org/jira/browse/HUDI-686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17062940#comment-17062940
 ] 

Vinoth Chandar commented on HUDI-686:
-------------------------------------

Timing the individual stages 

Roughly, here is how it looks like ..  Not sure how much more we can optimize 
this further, since the time spent is mostly inside parquet reading 

the metadata read cost i.e reading the footers dominates the first stage 
{code:java}
System.err.format("LazyRangeBloomChecker: %d, %d, %d, %d, %d, %d \n",
                    totalCount, totalMatches, totalTimeNs, 
totalMetadataReadTimeNs, totalRangeCheckTimeNs, totalBloomCheckTimeNs);

LazyRangeBloomChecker: 18632, 5068, 481673381, 439685698, 5872344, 26426499 
LazyRangeBloomChecker: 29312, 0, 397373925, 361189515, 12336753, 3205152 
LazyRangeBloomChecker: 36422, 0, 395838972, 364965143, 6870027, 3088563 
LazyRangeBloomChecker: 32698, 21252, 502987672, 374374961, 15190330, 94478078 
LazyRangeBloomChecker: 36633, 0, 420441840, 388992165, 7971801, 5222196 
LazyRangeBloomChecker: 35919, 35919, 547982738, 382770288, 17042127, 130529090 
LazyRangeBloomChecker: 26448, 26448, 673972735, 497887634, 12918131, 150188682 
LazyRangeBloomChecker: 29827, 25338, 739789660, 568953445, 14633164, 140977007 
LazyRangeBloomChecker: 40694, 40694, 611867636, 364297491, 20609305, 206717514 
LazyRangeBloomChecker: 41515, 41515, 754657982, 379440879, 18670251, 337857948 
LazyRangeBloomChecker: 46672, 46672, 761187684, 364060859, 18887398, 359483525 
LazyRangeBloomChecker: 26931, 2360, 296764733, 275044606, 3439711, 11417543 
LazyRangeBloomChecker: 41863, 20714, 831527864, 656157121, 13784027, 143710665 
LazyRangeBloomChecker: 36429, 0, 181597122, 157965082, 5342164, 3072219 
LazyRangeBloomChecker: 45618, 0, 180005379, 154248797, 6254112, 3332647 
LazyRangeBloomChecker: 60916, 60916, 730395000, 244153313, 24926738, 439724359 
 {code}
the reading of the actual keys themselves, dominate the second.. 
{code:java}
System.err.println("LazyKeyChecker: " + totalTimeNs + "," + totalCount + "," + 
totalReadTimeNs);

LazyKeyChecker: 32576530,2119,30998522
LazyKeyChecker: 39189497,3415,36666074
LazyKeyChecker: 36683534,3726,33878272
LazyKeyChecker: 293554458,38523,264821882
LazyKeyChecker: 297414709,39263,268215304
LazyKeyChecker: 212946950,65474,169525572
LazyKeyChecker: 1047598045,65998,1003946915
LazyKeyChecker: 1048062757,66734,1003969635
LazyKeyChecker: 1041348181,74948,992863777
[Stage 141:================================== {code}

> Implement BloomIndexV2 that does not depend on memory caching
> -------------------------------------------------------------
>
>                 Key: HUDI-686
>                 URL: https://issues.apache.org/jira/browse/HUDI-686
>             Project: Apache Hudi (incubating)
>          Issue Type: Improvement
>          Components: Index, Performance
>            Reporter: Vinoth Chandar
>            Assignee: Vinoth Chandar
>            Priority: Major
>             Fix For: 0.6.0
>
>         Attachments: Screen Shot 2020-03-19 at 10.15.10 AM.png, Screen Shot 
> 2020-03-19 at 10.15.10 AM.png, Screen Shot 2020-03-19 at 10.15.10 AM.png, 
> image-2020-03-19-10-17-43-048.png
>
>
> Main goals here is to provide a much simpler index, without advanced 
> optimizations like auto tuned parallelism/skew handling but a better 
> out-of-experience for small workloads. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to