Re: [PR] [refactor](be) push CHAR padding strip down to page decoder [doris]

via GitHub Fri, 22 May 2026 01:52:23 -0700


yiguolei commented on code in PR #63291:
URL: https://github.com/apache/doris/pull/63291#discussion_r3287155232



##########
be/src/storage/segment/binary_plain_page_v2_pre_decoder.h:
##########
@@ -37,127 +97,95 @@ namespace segment_v2 {
  * V1 format (output):
  *   Data: |binary1|binary2|...
  *   Trailer: |offset1(32-bit)|offset2(32-bit)|...| num_elems (32-bit)
+ *
+ * The decode pipeline is 7 steps:
+ *   1. parse header (validate sizes + extract num_elems + iteration bounds)
+ *   2. scan entries: record (data_start, out_len) per entry, sum total out_len
+ *   3. allocate the V1 output page (size_of_prefix + binary + offsets + 
trailer + tail)
+ *   4. write binary payload
+ *   5. write offsets array (running cursor over out_len)
+ *   6. write num_elems trailer
+ *   7. copy tail (footer + null map) and publish output params
+ *
+ * Steps 1, 3, 5, 6, 7 are identical for every variant and live in static
+ * helpers (parse_header, write_v1_output). Step 2 (the entry scan loop)
+ * differs across variants only in how it derives `out_len` from the raw
+ * V2 length — each subclass writes its own decode() so the scan loop has
+ * no virtual dispatch.
  */
 struct BinaryPlainPageV2PreDecoder : public DataPagePreDecoder {

Review Comment:
   感觉拆分了这个class 之后，代码难度变大了，不容易保证正确。不如给这个class 加一个模板参数，就表示是否char 类型，要消除padding 
之类的。这样容易看懂，保证代码正确。



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [refactor](be) push CHAR padding strip down to page decoder [doris]

Reply via email to