paul-rogers commented on issue #2421: URL: https://github.com/apache/drill/issues/2421#issuecomment-1005326255
@luocooong, your picture about page layout and cache performance is helpful: it shows how some DBs lay out pages. (I did something similar way back when I wrote a DB.) However, for an in-memory format, we'd do the layout differently. There would be no header (that's a separate object). We'd ensure that all data within each row is self-contained: no pointers from the footer back into rows. Impala uses a format which, I believe, it borrowed from earlier DBs. Here is a variation, listing the various fields, in order: * Row size. * Fixed-width columns (INT, LONG, etc.) These have a fixed offset from the row start. * Offset and length of each variable-width field. These also have a fixed offset. * Variable portion with the variable-width data. (VARCHAR, etc.) Note that it doesn't matter the order in which the variable fields are written: the offset/length pairs do the right thing. Just as Drill has a `BatchSchema` to say what vectors make up a batch, each block of rows would have a "row schema" to map from the logical schema (i.e. names and types) to offsets. Each fixed-width fetch is thus an addition (base + offset), plus a read/write. This is about the same as for vectors. (Vector address + row * size, then read.) The row data can, of course, reside in direct memory as for vectors. The result is that the entire row fits into the cache as a unit with no overhead cruft. In practice, each operator has an incoming and outgoing row (filter, project, probe phase of a join, hash sender, ...). Still, the rows should be small enough that both incoming and outgoing rows fit in the cache. Of course, the JVM will pull in byte code blocks, local variables, etc. So, the code has to also be designed carefully to avoid cache thrashing. That's what Drill's aggressive inlining and byte code fixups are supposed to do. Then, the number of threads has to be managed so we don't get swapped out just after we get our cache nicely set up. This stuff is HARD. Testing is essential. A somewhat-dated, but still helpful, source is [Martin Thompson's Mechanical Sympathy Blog](https://mechanical-sympathy.blogspot.com/). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org