paul-rogers commented on PR #13168:
URL: https://github.com/apache/druid/pull/13168#issuecomment-1275427976

   @599166320, you asked about the "special path" when the sort key is a prefix 
of the segment sort order. If this optimization does not exist today, it need 
not be done as part of this PR. It can be done as a follow-on refinement. I'd 
have to poke around in the scan query (and cursor, and storage adapter) code to 
see how this is handled in other query types. We don't have to get distracted 
here with that task.
   
   Thanks for summarizing the tasks. You noted:
   
   > Let me briefly summarize the problems you mentioned above. Based on this 
PR, I have two things to do next:
   >
   > Code implementation of special path(Segment by segment decision)
   >
   > Solve the problem of different data types in the same column of different 
segments
   >
   > Another thing: use Calcite (operator and planner based) to improve the 
query engine, which is what you plan to do later.
   
   This sounds right. The Calcite planner part is outside the scope of this PR 
(though whatever we do will use the sorter you create here.) The special path 
can be outside this PR if we don't already have a way to check this case, as 
noted above.
   
   That leaves several broad tasks:
   
   * Decide the level at which to sort. (Per-batch, per-cursor, or per-segment.)
   * Finish up the sorter for this case. The sorter can assume all values for a 
column have the same type since we're in a single segment. Handle the batching 
issues discussed earlier.
   * Finish up the merge code that is aware of the sort keys. This code should 
exist somewhere if we do support sorting other than just by `__time` in any 
query type. (The code may be specific to some other row format, however.)
   * Deal with the mixed-type issue we noted. If the above merge code exists, 
then it should already have a solution.
   * Test with a data set that spans multiple segments to ensure all the levels 
listed earlier work as expected.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to