jasonk000 commented on PR #13710:
URL: https://github.com/apache/druid/pull/13710#issuecomment-1407450510

   @clintropolis , thx for comments. I think it's ready to take a look at.
   
   I've included benchmark results [for each 
commit](https://github.com/apache/druid/pull/13710/commits), and shown the 
progress. Each commit is broken to a small change so if we would prefer 
distinct PRs we can do that too.
   
   > it would probably be worth checking that query performance for realtime 
tasks is not negatively impacted too much by this change (especially from 
multiple concurrent threads). There is only one writer thread but potentially 
many reader threads, as new dimension selectors are created on top of this 
dictionary during realtime queries.
   
   I've added test explicitly for the contended situation and brought in 
StampedLock which has much better performance with mixed read workloads.
   
   It looks to me like most of the time it is one-thread-writing with possibly 
many readers, so I think this is a good blend. We could build a more advanced 
ArrayList structure that would be lockfree since this is append-only, but i'm 
not sure it's warranted. Let me know your thoughts.
   
   > For batch ingestion tasks, it might be worth using a separate 
`DimensionDictionary` implementation that has no locks. I use a similar thing 
for the `NestedDataColumnIndexer`, 
   
   I included a benchmark for single-writer only and it shows up with similar 
performance to the contended writer situation.
   
   Overall, for JDK11, 
   
   BEFORE
   ```
   jdk11 -- Benchmark                                                       
Mode  Cnt      Score     Error  Units
   StringDimensionIndexerProcessBenchmark.parallelReadWrite                 
avgt   10  18769.135 ± 705.057  us/op
   StringDimensionIndexerProcessBenchmark.parallelReadWrite:parallelReader  
avgt   10  23772.439 ± 947.011  us/op
   StringDimensionIndexerProcessBenchmark.parallelReadWrite:parallelWriter  
avgt   10   3759.221 ±  59.897  us/op
   StringDimensionIndexerProcessBenchmark.soloReader                        
avgt   10    666.688 ±  12.723  us/op
   StringDimensionIndexerProcessBenchmark.soloWriter                        
avgt   10   3013.473 ±  13.480  us/op
   ```
   
   AFTER
   ```
   jdk11 - Benchmark                                                        
Mode  Cnt     Score    Error  Units
   StringDimensionIndexerProcessBenchmark.parallelReadWrite                 
avgt   10   873.654 ± 12.358  us/op
   StringDimensionIndexerProcessBenchmark.parallelReadWrite:parallelReader  
avgt   10   283.315 ±  0.067  us/op
   StringDimensionIndexerProcessBenchmark.parallelReadWrite:parallelWriter  
avgt   10  2644.672 ± 49.549  us/op
   StringDimensionIndexerProcessBenchmark.soloReader                        
avgt   10   284.451 ±  0.588  us/op
   StringDimensionIndexerProcessBenchmark.soloWriter                        
avgt   10  2284.851 ± 51.339  us/op
   ```
   
   Overall, both the contended and solo read/write is all improved.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to