bruns added a comment.
In D11828#242402 <https://phabricator.kde.org/D11828#242402>, @michaelh wrote: > I'm not very familiar with the concept of iterators (yet). To me it looks like `auto i = new OrPostingIterator(iters); i->docId();` will return 0 and `i->next()` returns a valid docId. After that `i->docId();` is also valid. > Is this how iterators work? Naively I would expect `i->docId();` to be valid after construction or both `i->docId();` and `i->next()` to be invalid. > > I have to admit, it's very difficult to understand (and I don't, really) what baloo is doing here with all this iterating over vectors of inherited iterator classes, pointer-to-pointer stuff of which some might not even exist. My brain hurts, terribly! All I can presently do is ask some simple questions. Sorry, I'm of no help here. Please take my comments as a device for better understanding and I hope it's not to much of a bother. Each PostingIterator represents a "set" of document ids. PostingIterators are not related to STL or Qt container iterators (although the "subiterators" are implemented using QVectors<>). The `PI::next()` call returns the smallest element and removes it from the set, `PI::docId()` just returns the smallest element. The result set of the OrPostingIterator is the union of its "subsets" (represented by its "subiterators", i.e. the QVector<PostingIterator*> constructor argument). Or works a little bit like an insertion sort. To return the smallest element, it creates the union of the smallest element of all subsets, and yields the smallest element of the union. It then calls `PI::next()` on all subsets which returned the smallest element, removing it from all subsets and thus from the union. In C++, you can use iterators in two different ways, either PreIncrement or PostIncrement. `PostingIterator::next()` will behave like `*(++it)`, i.e. it is incremented first, then the current value is returned. I don't know the original justification for this design. One possible idea is to delay the actual database query until `next()` is called the first time. For e.g. a query "a AND b AND c", one can execute "a", if not empty do "b", intersect (i.e. calling the Iterators of Result("a") and Result("b") until both return the same value), and if the intersection is non-empty, do "c". Currently, it does not implement this optimization. D12025 <https://phabricator.kde.org/D12025> implements one case, i.e. stopping if a leaf query returns an empty set, and propagating it upwards. REPOSITORY R293 Baloo REVISION DETAIL https://phabricator.kde.org/D11828 To: bruns, #baloo, michaelh Cc: #frameworks, ashaposhnikov, michaelh, astippich, spoorun, ngraham, bruns, alexeymin