yjshen edited a comment on issue #1708: URL: https://github.com/apache/arrow-datafusion/issues/1708#issuecomment-1027779249
After some code/doc checking into the existing systems, the three systems' row layouts are: **Postgresql:** var-length tuple - null-bits first (byte aligned) - store **all** attributes sequentially, - add **extra padding if needed** before each attribute - E.g. table A (bool, char, int32), no padding between bool and char since they are both 1 byte aligned, but 2 bytes padding after char and before int32, since int32 is 4 bytes aligned. - store var-length attribute in place (length first, then content; if the value is not too big/"TOAST" in its term). 1-byte-length for varlena length up to 126 bytes. - **Value access:** most difficult, its O(n) of complexity since it needs to access all previous attr of a tuple to calculate padding/length until the start offset of an attr can be deduced. Check [Data Alignment in PostgreSQL](https://www.enterprisedb.com/postgres-tutorials/data-alignment-postgresql), [Column Storage Internals](https://momjian.us/main/blogs/pgblog/2017.html#March_15_2017), [CodeSample in Page16](https://momjian.us/main/writings/pgsql/inside_shmem.pdf) for more details. **DuckDB:** fixed-length tuple - null-bits first (byte aligned) - store fixed-sized attributes sequentially. For var-length attributes, store an 8-byte pointer (on x64) - ~~**no padding between** attributes~~ no padding between data columns, but padding for each aggregate value. - var-length attribute pointer - point to the store called "row heap". - In the row heap, var length attributes/strings for one tuple are stored continuously. - **Value access:** An extra `vector<idx_t> offsets` is employed to achieve O(1) simple attr access, and O(1 + 1) var-len-attr access. Check [Source Code](https://github.com/duckdb/duckdb/blob/master/src/common/types/row_layout.cpp#L32-L66) and [a related blog post/external sorting section](https://duckdb.org/2021/08/27/external-sorting.html) for more details. **SparkSQL:** var-length tuple - null-bits first (8-byte aligned) - store each attribute sequentially, **8 bytes aligned for each** attribute; - for var-length attribute, pack (offset+length) into 8 bytes and store in place, store the actual var-length attributes after all fixed fields. (the var-len-attr itself is again 8 bytes aligned) - **Value access:** No extra structure needed, O(1) for simple attr access, O(1+1) for var-len-attr access. Check [Source Code](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java#L46-L61) for more details. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org