Hi Anastasia,

1) At ingestion time the FactsHolder is sorted. The unsorted code path is
used by groupBy v1, which hasn't been common since groupBy v2 was made the
default a few releases ago. So I would only worry about the sorted case.

2) PlainFactsHolder is used when the user has disabled rollup at ingestion
time. The idea is that with the RollupFactsHolder there will be a _single_
fact row per TimeAndDims (and Druid may combine multiple input rows into
one indexed fact row). But with the PlainFactsHolder there may be more than
one fact row per TimeAndDims (in particular: there will be one fact row per
input row).

Hope this helps.

On Wed, May 30, 2018 at 12:14 AM, Anastasia Braginsky <
anas...@oath.com.invalid> wrote:

> Hi,
> Recall our suggestion to use the new concurrent map named Oak as a base
> for Incremental Index. Oak stands for Off-heap Allocated Keys, for more
> details please see issue #5698. We had a great progress with Oak
> integration and stabilizing OakIndex performance. We have some questions
> regarding FactsHolder. As we explained in our design document and
> refactoring suggestion we prefer to remove the FactsHolder usage in
> the OakIndex, because Oak maps the keys (Time&Dims) to the values
> (Aggregators) directly. Therefore the Oak mapping is always sorted and only
> from keys to values. From here we have two questions.
>
> 1. Unsorted FactsHolder: It is understandable that unsorted mapping via
> HashMap (O(1) access) might be faster than sorted mapping (O(logN) access).
> The question is whether the unsorted variant used frequently? When it is
> used? And is it acceptable that in this case Oak will give slightly lower
> performance?
>
> 2. Regarding Plain- vs Rollup- FactsHolder: It can be seen that
> PlainFactsHolder is holding a queue of Key->Value (Time&Dims->Aggregator)
> per Timestamp, where the sorting is via Timestamp. Therefore, Oak
> implements mostly sorted RollupFactsHolder logic. Additionally, Timestamp
> is also a part of TIme&Dims and the sorting is initially according to
> Timestamp, then other dimensions. The question is what are the use-cases
> where the PlainFactsHolder and not Rollup is used? And is there any
> functionality that can be given by Plain but not by Rollup?
>
> Thanks,Anastasia
>

Reply via email to