Re: [Drizzle-discuss] drizzle - pbxt merge status

Paul McCullagh Thu, 19 Nov 2009 01:37:14 -0800

Hi Stewart,

On Nov 19, 2009, at 5:50 AM, Stewart Smith wrote:

On Wed, Nov 18, 2009 at 07:51:04PM -0800, Brian Aker wrote:

I'm not sure that this will really work longterm. Since we rapidly
are approaching the day when store_lock() will not longer exist
engines really will need to be designed a bit differently to really
make maximum benefit of Drizzle.


I'm pretty sure the way forward is to have sep Cursor/Handler and
StorageEngine classes for MySQL and Drizzle. Otherwise the #ifdefs are
just going to be nuts (which they probably are already).

Yup, supporting various versions of MySQL makes the #ifdef situationbad enough.

At some point I want to remove passing the buffer in write_row() for
example. If you want a byte array you will need to loop the Field
(eventually Value) objects and store the results (which... you are

probably already doing unless you are not doing blobs or notpacking...).


We're going to have to be really careful not to at all screw
performance here...

There's an on the wire record format, and then there's at least one
record format per engine (with column based engines being very
different).

we'll have to have some way to ensure we only ever have one conversion
going on, and preferably only when we *have* to blow the row over the
wire.

I wouldn't have a problem changing the format of records on disk inPBXT. So, ideally the engine would receive a record in a form that itcould copy straight to the disk, without conversion.

So it would be cool if the Field objects referenced data that waspacked into a single buffer.


Then we would have the best of both worlds:

Engines that have a different record format to Drizzle can loopthrough the Field objects. And engines that do not require there ownstorage format, can write out the bytes straight from the buffer.

Basically, a format that is suitable for the disk is a packed (notcompressed), variable length format.


PBXT currently uses 2 formats on disk:

1. Fixed length records - these use the unconverted MySQL internalrecord format.

2. Variable length records - use a packed form of the MySQL record data.

When a table is created PBXT decides whether to use format (1) or (2).(1) is used if using the variable length format will not save muchspace.

Besides being packed into one buffer, the format should also be suchthat, given a buffer containing the record, and the table definition,I should be possible to find the starting point of the data in eachfield.

It may make sense to leave BLOB data out of the record buffer, butmaybe not as MySQL does it today, with a pointer to the BLOB data inthe buffer. The reason is because this messes things up. A buffer withpointer in it cannot be copied as is to disk.

So the Field object could reference a different block of memory foreach BLOB. The engine can then just write the BLOBs in order, afterthe main record, if it stores the data consecutively on disk.

The ideas forming in my head now about it are a bit NdbRecord-ish.
Having a description of the record format, and perhaps having just
about all of the server go through it (the fun comes with engines
storing data types in a way that the server doesn't really
understand).

memcpy() is probably the most expensive thing we can possibly do
(apart from going to/from disk).

--
Stewart Smith

_______________________________________________
Mailing list: https://launchpad.net/~drizzle-discuss
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~drizzle-discuss
More help   : https://help.launchpad.net/ListHelp




--
Paul McCullagh
PrimeBase Technologies
www.primebase.org
www.blobstreaming.org
pbxt.blogspot.com




_______________________________________________
Mailing list: https://launchpad.net/~drizzle-discuss
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~drizzle-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Drizzle-discuss] drizzle - pbxt merge status

Reply via email to