Note that the in memory representation of the row in the storage layer is bytes, but that the in memory representation of the row returned to the rest of the system is an array of java objects.
There are 2 interesting cases: 1) queries which get their rows from pages in the cache 2) queries which don't get any hits
For case 2 imagine a scan of a million rows, with no cache hits. In the
current architecture with many of the datatypes the scan can be done with a single object allocation vs. 1 million * number of column allocation for the object. I haven't run the test recently - what is
the overhead of 10 million object alloc/dealloc on a recent jvm?
An immutable row architecture would probably perform better for case 1 than the current system. It is not a simple change as currently the
interfaces assume the caller into store "owns" the space of the row, so
store can't cache the object versions of rows and pass them back as
they might change.
An immutable row architecture would make even more sense for a in-memory version of derby.
Jean Morissette wrote:
Hi developers,
If you could recreate Derby, what would be the more globaly performant tuple memory representation (byte[], ByteBuffer, offet in a byte[]/ByteBuffer, java object, ...) that you would choose?
I'm wondering if creating java object for each tuple and let the gc do its work would be more performant than having a reusable ByteBuffer that contains many raw tuples? What do you think?
Thanks, -Jean
