paul-rogers commented on PR #13168:
URL: https://github.com/apache/druid/pull/13168#issuecomment-1272151595

   @599166320, nice improvements! We can perhaps simplify the code even further.
   
   Scan query is a bit confusing. The `ScanQueryEngine` produces batches of 
rows (the `ScanResultValue`), but to sort want individual rows. To mimic the 
current code, we have to put those rows back into batches for the rest of the 
query stack. Let's take these one-by-one.
   
   First, let's remind ourselves that, if we sort (even with a limit) we need 
all the rows. So, we do want to call the thing that runs the engine. The 
question is, what is the format of the data and what do we do with it?
   
   For better or worse, `ScanQueryEngine` takes the rows from the cursor and 
wraps them in `ScanResultValue`. Worse, for our purposes, `ScanQueryEngine` 
maps the rows to the two different scan query result formats. What we really 
want is single rows in a single, known format. Second best is a batch of rows 
in a single, known format. That format should be the "compact list" since we'll 
be storing them in memory.
   
   So, our first task is to convince the `ScanQueryEngine` to give us what we 
want. We could modify the engine to return rows. But, that code is a bit 
complex, so we might want to use what we have, at least for now. 
   
   You are already rewriting the query to strip off the sort and limit when we 
do your custom sort. We can also force the format to 
`RESULT_FORMAT_COMPACTED_LIST`. This will give us, returned from 
`ScanQueryEngine.process` a sequence of `ScanQueryValues`. 
   
   But, we want rows. So, the next step is to unpack the `ScanQueryValue` 
objects into individual rows, and store those in an array, ready for sorting. 
Your code does this (though see comments.)
   
   Then, we an apply the sort you've developed. Here, we can know that we have 
rows in compact list form, so each sort key can be translated to a position so 
we're comparing `row1[posn], row2[posn]`. That's about as fast as it can get.
   
   Now we've got a sorted array. But, we want it potentially limited. So, our 
effective row count is `min(rowCount, limit)`.
   
   We now have a (perhaps limited) sorted list. The final step is to recreate 
the `ScanResultValue` objects, potentially converting to the inefficient 
`RESULT_FORMAT_LIST` if configured. The size of each of the `ScanResultValue` 
objects is, I believe, controlled by the batch value in the query.
   
   Perhaps all of this can be handled by a single class based on 
`SimpleSequence()`.
   
   * `make()` reads all the rows, creates the array, sorts it, and returns our 
"batching iterator".
   * The "batching iterator" `next()` provides the next `batchSize` rows 
converted to a `ScanResultValue`, stopping when we hit the effective row count 
computed above.
   
   With that, `ScanQueryRunnerFactory.ScanQueryRunner`:
   
   * Checks if a custom sort is needed using the function you've provided.
   * If not, just returns the sequence from `ScanQueryEngine.process` as today.
   * Else
     * Modifies the query as we discussed.
     * Calls `ScanQueryEngine.process`
     * Hands the returned sequence to the sorter sequence described above
     * Returns the sorter sequence
   
   All this said, there is a bit of code that is pretty close already: 
`ScanQueryRunnerFactory.stableLimitingSort(). That methods returns a sequence 
which unpacks events, and sorts them. It is not quite what we want because it 
uses a priority queue (awkward for large numbers of items, great for merging) 
and returns rows one-by-one (we want a batch.) Still, it provides some good 
examples. (You've probably already looked at it since your implementation is 
similar.)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to