mikemccand commented on PR #12711: URL: https://github.com/apache/lucene/pull/12711#issuecomment-1803774655
> Really, if we'd be implementing the feature today would we use a bitset or maybe a sparse DV field recording the number of children for each block in the index? In fact, in order to make use of your doc blocks at search time (`ToParent/ChildBlockJoinQuery`), users must already provide a bitset marking which docs are parents (I think this is typically backed by postings, e.g. a simple `parent_doc` postings list? But doc values ought to work too), so it's not much to ask the user to also make this field explicit during indexing so Lucene can more correctly track/sort the blocks? I like this approach. But is the idea just to prevent incorrect usage (mixing non-congruent sort with doc blocks)? Or, is it to fix whatever sort the user provides so that doc blocks will work correctly (i.e. whole doc blocks get sorted atomically, order preserved, according to sort criteria only on the parent docs)? The latter would be great: users need only concern themselves with how parent docs sort. And I think the nested case would "just work" since the entire block (and sub-blocks) is preserved in the order it was originally indexed. > > Another question: do we have any testing around this sort-stability / block-preservation today? I'm getting nervous now that we are relying on an undocumented feature that just happens to work. EG I checked TestIndexSorting and it doesn't seem to call add/updateDocuments at all. > > I don't think we have any tests for this. Otherwise the build would have failed on this PR +1 to first improve testing of this. Scary the combination of static sort and doc blocks is untested. Maybe we already broke this in 9.x / main! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org