mikemccand commented on PR #12711:
URL: https://github.com/apache/lucene/pull/12711#issuecomment-1803774655

   > Really, if we'd be implementing the feature today would we use a bitset or 
maybe a sparse DV field recording the number of children for each block in the 
index?
   
   In fact, in order to make use of your doc blocks at search time 
(`ToParent/ChildBlockJoinQuery`), users must already provide a bitset marking 
which docs are parents (I think this is typically backed by postings, e.g. a 
simple `parent_doc` postings list?  But doc values ought to work too), so it's 
not much to ask the user to also make this field explicit during indexing so 
Lucene can more correctly track/sort the blocks?  I like this approach.
   
   But is the idea just to prevent incorrect usage (mixing non-congruent sort 
with doc blocks)?  Or, is it to fix whatever sort the user provides so that doc 
blocks will work correctly (i.e. whole doc blocks get sorted atomically, order 
preserved, according to sort criteria only on the parent docs)?  The latter 
would be great: users need only concern themselves with how parent docs sort.  
And I think the nested case would "just work" since the entire block (and 
sub-blocks) is preserved in the order it was originally indexed.
   
   > > Another question: do we have any testing around this sort-stability / 
block-preservation today? I'm getting nervous now that we are relying on an 
undocumented feature that just happens to work. EG I checked TestIndexSorting 
and it doesn't seem to call add/updateDocuments at all.
   > 
   > I don't think we have any tests for this. Otherwise the build would have 
failed on this PR
   
   +1 to first improve testing of this.  Scary the combination of static sort 
and doc blocks is untested.  Maybe we already broke this in 9.x / main!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to