gianm commented on PR #12745: URL: https://github.com/apache/druid/pull/12745#issuecomment-1175779180
@FrankChen021 good question! The advantage of the row-based format is that you can sort and merge it really fast. When there's a sort key, that's at the beginning of each row, and is designed in such a way that it can be compared as bytes. So data can be sorted using a single memory comparison, no matter the key length, without any decoding or deserialization. With columnar data, sorting requires at least one separate memory access per key part, and generally also requires decoding prior to comparison. Sorted streams of frames can also be merged really fast, using a min-heap of input frames using a memcmp-based comparator. I tried implementing this sort-and-merge stuff with both columnar and row-based frames, and found row-based was 2–3x faster. The code for this isn't in this patch set, but it would be part of the next one. So, the idea is we can use row-based frames when they have an advantage, and columnar frames when they don't. (I expect columnar frames would be faster for most ops that aren't comparison-related.) Some relevant code in this patch: - [RowBasedFrameWriter](https://github.com/apache/druid/blob/9408d98af81b90d57ff961162450d60a0d8b349e/processing/src/main/java/org/apache/druid/frame/write/RowBasedFrameWriter.java) implements creation of row-based frames. - [StringFieldWriter](https://github.com/apache/druid/blob/9408d98af81b90d57ff961162450d60a0d8b349e/processing/src/main/java/org/apache/druid/frame/field/StringFieldWriter.java) is an example of the field writers used to create row-based frames. Note that it is able to avoid serialization and deserialization: if the DimensionSelector it's reading from supports direct utf8 access, then it uses `lookupNameUtf8` (by way of `FrameWriterUtils.getUtf8ByteBufferFromStringSelector`) to copy the input data without a serde round trip. - [FrameComparisonWidgetImpl](https://github.com/apache/druid/blob/9408d98af81b90d57ff961162450d60a0d8b349e/processing/src/main/java/org/apache/druid/frame/key/FrameComparisonWidgetImpl.java) implements memcmp based comparison of row-based frames. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
