[
https://issues.apache.org/jira/browse/HUDI-686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17062940#comment-17062940
]
Vinoth Chandar commented on HUDI-686:
-------------------------------------
Timing the individual stages
Roughly, here is how it looks like .. Not sure how much more we can optimize
this further, since the time spent is mostly inside parquet reading
the metadata read cost i.e reading the footers dominates the first stage
{code:java}
System.err.format("LazyRangeBloomChecker: %d, %d, %d, %d, %d, %d \n",
totalCount, totalMatches, totalTimeNs,
totalMetadataReadTimeNs, totalRangeCheckTimeNs, totalBloomCheckTimeNs);
LazyRangeBloomChecker: 18632, 5068, 481673381, 439685698, 5872344, 26426499
LazyRangeBloomChecker: 29312, 0, 397373925, 361189515, 12336753, 3205152
LazyRangeBloomChecker: 36422, 0, 395838972, 364965143, 6870027, 3088563
LazyRangeBloomChecker: 32698, 21252, 502987672, 374374961, 15190330, 94478078
LazyRangeBloomChecker: 36633, 0, 420441840, 388992165, 7971801, 5222196
LazyRangeBloomChecker: 35919, 35919, 547982738, 382770288, 17042127, 130529090
LazyRangeBloomChecker: 26448, 26448, 673972735, 497887634, 12918131, 150188682
LazyRangeBloomChecker: 29827, 25338, 739789660, 568953445, 14633164, 140977007
LazyRangeBloomChecker: 40694, 40694, 611867636, 364297491, 20609305, 206717514
LazyRangeBloomChecker: 41515, 41515, 754657982, 379440879, 18670251, 337857948
LazyRangeBloomChecker: 46672, 46672, 761187684, 364060859, 18887398, 359483525
LazyRangeBloomChecker: 26931, 2360, 296764733, 275044606, 3439711, 11417543
LazyRangeBloomChecker: 41863, 20714, 831527864, 656157121, 13784027, 143710665
LazyRangeBloomChecker: 36429, 0, 181597122, 157965082, 5342164, 3072219
LazyRangeBloomChecker: 45618, 0, 180005379, 154248797, 6254112, 3332647
LazyRangeBloomChecker: 60916, 60916, 730395000, 244153313, 24926738, 439724359
{code}
the reading of the actual keys themselves, dominate the second..
{code:java}
System.err.println("LazyKeyChecker: " + totalTimeNs + "," + totalCount + "," +
totalReadTimeNs);
LazyKeyChecker: 32576530,2119,30998522
LazyKeyChecker: 39189497,3415,36666074
LazyKeyChecker: 36683534,3726,33878272
LazyKeyChecker: 293554458,38523,264821882
LazyKeyChecker: 297414709,39263,268215304
LazyKeyChecker: 212946950,65474,169525572
LazyKeyChecker: 1047598045,65998,1003946915
LazyKeyChecker: 1048062757,66734,1003969635
LazyKeyChecker: 1041348181,74948,992863777
[Stage 141:================================== {code}
> Implement BloomIndexV2 that does not depend on memory caching
> -------------------------------------------------------------
>
> Key: HUDI-686
> URL: https://issues.apache.org/jira/browse/HUDI-686
> Project: Apache Hudi (incubating)
> Issue Type: Improvement
> Components: Index, Performance
> Reporter: Vinoth Chandar
> Assignee: Vinoth Chandar
> Priority: Major
> Fix For: 0.6.0
>
> Attachments: Screen Shot 2020-03-19 at 10.15.10 AM.png, Screen Shot
> 2020-03-19 at 10.15.10 AM.png, Screen Shot 2020-03-19 at 10.15.10 AM.png,
> image-2020-03-19-10-17-43-048.png
>
>
> Main goals here is to provide a much simpler index, without advanced
> optimizations like auto tuned parallelism/skew handling but a better
> out-of-experience for small workloads.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)