gianm commented on issue #5698: Oak: New Concurrent Key-Value Map URL: https://github.com/apache/incubator-druid/issues/5698#issuecomment-508266279 Some context around the potential impact of Oak on Druid's IncrementalIndex: 1. There are two things that matter at ingestion time: ingestion throughput (for all forms of ingestion) and query latency (for realtime ingestion only -- batch ingest does not serve queries). 2. It looks like you've been doing a lot of testing with really big incremental indexes, but it's more normal in Druid land to have smaller ones. There are a couple of reasons for this. One is that bigger indexes take up more memory, and large amounts of memory aren't always available. Another is that querying bigger indexes takes more time. The way Druid's ingestion works is that periodically, the incremental indexes are persisted to disk in Druid's segment format, which is compressed (smaller) and columnar & indexed (faster to query on a per-row basis). There are also multiple persist files and each can be processed in parallel. Keeping around too many rows in memory can negatively impact query speeds. In other words: having a 5 million row incremental index means that queries on realtime data cannot be faster than however long it takes to process those 5 million rows. This latter point matters for realtime ingestion (where queries happen concurrently with ingestion), and so for understanding the impact there, it'd be important to see how long queries take. 3. For ingestion throughput there are three components that matter: throughput of adding to an incremental index, how long it takes to persist the incremental index to disk, and how long it takes to merge persisted indexes into a final segment at the end of the ingestion cycle. All of them matter, & one reason folks are asking for real-world numbers is to make sure all three of these are being taken into account. Experience and query/ingest-rate metrics in a real cluster is the easiest way to validate all of the above, since the system is fairly complex and there are a lot of potential tradeoffs involved between the various components. If you don't have a real-world dataset available maybe try the publicly available tweets dataset from the Twitter Streaming API. We often use it for test clusters. Some resources for that: - https://developer.twitter.com/en/docs/tweets/sample-realtime/overview/GET_statuse_sample - http://twitter4j.org/en/code-examples.html (look for "Streaming API") ------- That all being said, you might also want to look at the potential impact of Oak on Druid's groupBy engine. Check out the ConcurrentGrouper class, which groupBy v2 queries use for parallel aggregation. In particular, check out the `aggregate(KeyType key, int keyHash)` method. It is slicing up a buffer and then synchronizing on each slice, a pretty simple strategy that I am sure could be improved on. Maybe Oak could do better. The code path is somewhat similar to IncrementalIndex: both of them involve grouping by time and dimensions, and aggregating using AggregatorFactories.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
