Hi Stewart,

On Nov 19, 2009, at 5:50 AM, Stewart Smith wrote:

On Wed, Nov 18, 2009 at 07:51:04PM -0800, Brian Aker wrote:
I'm not sure that this will really work longterm. Since we rapidly
are approaching the day when store_lock() will not longer exist
engines really will need to be designed a bit differently to really
make maximum benefit of Drizzle.

I'm pretty sure the way forward is to have sep Cursor/Handler and
StorageEngine classes for MySQL and Drizzle. Otherwise the #ifdefs are
just going to be nuts (which they probably are already).

Yup, supporting various versions of MySQL makes the #ifdef situation bad enough.

At some point I want to remove passing the buffer in write_row() for
example. If you want a byte array you will need to loop the Field
(eventually Value) objects and store the results (which... you are
probably already doing unless you are not doing blobs or not packing...).

We're going to have to be really careful not to at all screw
performance here...

There's an on the wire record format, and then there's at least one
record format per engine (with column based engines being very
different).

we'll have to have some way to ensure we only ever have one conversion
going on, and preferably only when we *have* to blow the row over the
wire.

I wouldn't have a problem changing the format of records on disk in PBXT. So, ideally the engine would receive a record in a form that it could copy straight to the disk, without conversion.

So it would be cool if the Field objects referenced data that was packed into a single buffer.

Then we would have the best of both worlds:

Engines that have a different record format to Drizzle can loop through the Field objects. And engines that do not require there own storage format, can write out the bytes straight from the buffer.

Basically, a format that is suitable for the disk is a packed (not compressed), variable length format.

PBXT currently uses 2 formats on disk:

1. Fixed length records - these use the unconverted MySQL internal record format.
2. Variable length records - use a packed form of the MySQL record data.

When a table is created PBXT decides whether to use format (1) or (2). (1) is used if using the variable length format will not save much space.

Besides being packed into one buffer, the format should also be such that, given a buffer containing the record, and the table definition, I should be possible to find the starting point of the data in each field.

It may make sense to leave BLOB data out of the record buffer, but maybe not as MySQL does it today, with a pointer to the BLOB data in the buffer. The reason is because this messes things up. A buffer with pointer in it cannot be copied as is to disk.

So the Field object could reference a different block of memory for each BLOB. The engine can then just write the BLOBs in order, after the main record, if it stores the data consecutively on disk.

The ideas forming in my head now about it are a bit NdbRecord-ish.
Having a description of the record format, and perhaps having just
about all of the server go through it (the fun comes with engines
storing data types in a way that the server doesn't really
understand).

memcpy() is probably the most expensive thing we can possibly do
(apart from going to/from disk).

--
Stewart Smith

_______________________________________________
Mailing list: https://launchpad.net/~drizzle-discuss
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~drizzle-discuss
More help   : https://help.launchpad.net/ListHelp



--
Paul McCullagh
PrimeBase Technologies
www.primebase.org
www.blobstreaming.org
pbxt.blogspot.com




_______________________________________________
Mailing list: https://launchpad.net/~drizzle-discuss
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~drizzle-discuss
More help   : https://help.launchpad.net/ListHelp

Reply via email to