gianm commented on PR #12745:
URL: https://github.com/apache/druid/pull/12745#issuecomment-1175779180

   @FrankChen021 good question! The advantage of the row-based format is that 
you can sort and merge it really fast. When there's a sort key, that's at the 
beginning of each row, and is designed in such a way that it can be compared as 
bytes. So data can be sorted using a single memory comparison, no matter the 
key length, without any decoding or deserialization. With columnar data, 
sorting requires at least one separate memory access per key part, and 
generally also requires decoding prior to comparison.
   
   Sorted streams of frames can also be merged really fast, using a min-heap of 
input frames using a memcmp-based comparator. I tried implementing this 
sort-and-merge stuff with both columnar and row-based frames, and found 
row-based was 2–3x faster. The code for this isn't in this patch set, but it 
would be part of the next one.
   
   So, the idea is we can use row-based frames when they have an advantage, and 
columnar frames when they don't. (I expect columnar frames would be faster for 
most ops that aren't comparison-related.)
   
   Some relevant code in this patch:
   
   - 
[RowBasedFrameWriter](https://github.com/apache/druid/blob/9408d98af81b90d57ff961162450d60a0d8b349e/processing/src/main/java/org/apache/druid/frame/write/RowBasedFrameWriter.java)
 implements creation of row-based frames.
   - 
[StringFieldWriter](https://github.com/apache/druid/blob/9408d98af81b90d57ff961162450d60a0d8b349e/processing/src/main/java/org/apache/druid/frame/field/StringFieldWriter.java)
 is an example of the field writers used to create row-based frames. Note that 
it is able to avoid serialization and deserialization: if the DimensionSelector 
it's reading from supports direct utf8 access, then it uses `lookupNameUtf8` 
(by way of `FrameWriterUtils.getUtf8ByteBufferFromStringSelector`) to copy the 
input data without a serde round trip.
   - 
[FrameComparisonWidgetImpl](https://github.com/apache/druid/blob/9408d98af81b90d57ff961162450d60a0d8b349e/processing/src/main/java/org/apache/druid/frame/key/FrameComparisonWidgetImpl.java)
 implements memcmp based comparison of row-based frames.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to