paul-rogers commented on issue #2421:
URL: https://github.com/apache/drill/issues/2421#issuecomment-1005326255


   @luocooong, your picture about page layout and cache performance is helpful: 
it shows how some DBs lay out pages. (I did something similar way back when I 
wrote a DB.) However, for an in-memory format, we'd do the layout differently. 
There would be no header (that's a separate object). We'd ensure that all data 
within each row is self-contained: no pointers from the footer back into rows.
   
   Impala uses a format which, I believe, it borrowed from earlier DBs. Here is 
a variation, listing the various fields, in order:
   
   * Row size.
   * Fixed-width columns (INT, LONG, etc.) These have a fixed offset from the 
row start.
   * Offset and length of each variable-width field. These also have a fixed 
offset.
   * Variable portion with the variable-width data. (VARCHAR, etc.)
   
   Note that it doesn't matter the order in which the variable fields are 
written: the offset/length pairs do the right thing. Just as Drill has a 
`BatchSchema` to say what vectors make up a batch, each block of rows would 
have a "row schema" to map from the logical schema (i.e. names and types) to 
offsets. Each fixed-width fetch is thus an addition (base + offset), plus a 
read/write. This is about the same as for vectors. (Vector address + row * 
size, then read.) The row data can, of course, reside in direct memory as for 
vectors.
   
   The result is that the entire row fits into the cache as a unit with no 
overhead cruft. In practice, each operator has an incoming and outgoing row 
(filter, project, probe phase of a join, hash sender, ...). Still, the rows 
should be small enough that both incoming and outgoing rows fit in the cache.
   
   Of course, the JVM will pull in byte code blocks, local variables, etc. So, 
the code has to also be designed carefully to avoid cache thrashing. That's 
what Drill's aggressive inlining and byte code fixups are supposed to do. Then, 
the number of threads has to be managed so we don't get swapped out just after 
we get our cache nicely set up.
   
   This stuff is HARD. Testing is essential. A somewhat-dated, but still 
helpful, source is [Martin Thompson's Mechanical Sympathy 
Blog](https://mechanical-sympathy.blogspot.com/).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to