Re: [Drizzle-discuss] Improving the Engine API

Jay Pipes Tue, 15 Dec 2009 09:26:26 -0800

Brian Aker wrote:

Hi!


On Dec 14, 2009, at 4:34 PM, Jay Pipes wrote:

Right, but this is NOT what Stewart has proposed for the AlterTable statement.  
Stewart (and Stewart, correct me if I'm wrong) would like to send the *actions* 
that the master executed for the Alter Table.  I am opposed to this (see link 
above for the ML post with my reasons why)


Agreed, I would not want to send the actions. That would be completely 
unportable.

I was just commenting on the list of actions being accurate.

OK.

We have 3 types of statements:
DML update statements:    INSERT, UPDATE, DELETE
DML read-only statements: SELECT.
DDL statements:           CREATE TABLE, ALTER TABLE, etc.
This right here is the heart of the differences. STATEMENTs are not really what you want. You want to know the sort of action, aka read/write/reformat...
Actually, no, what we've been discussing is that the actions are exactly *not* 
what Paul/Toru need. They need to know the start and end of the statements...


By Statement, I mean the object Statement. The start and end of an execution? 
Sure, that makes sense, but you don't want people hardcoding in if/else for 
Statement objects.

Ah, OK, that makes much more sense :) Yes, we are talking about theStatement *messages* (i.e. GPB messages) to be sent to the engine whichdescribe the SQL statement which should be executed.

I'm thinking that you and Stewart would benefit from a summary,including all the example code me, Paul, and Toru have been workingthrough, so I will put that together shortly. Basically, what Paul,Toru and myself have been putting together is a flexible API that wouldallow the engine to handle the SQL statement entirely if it wanted to,to let the kernel execute the SQL statement entirely, or a work-sharingAPI that allows the engine to participate at various times in theexecution of the statement by the kernel...


Anyway, I'll write up the summary.

Cheers,

Jay

-jay

And assuming we have 2 sets of calls:
- beginTransaction, commitTransaction/rollbackTransaction
- startStatement, endStatement

We could say, all types of statements require a beginTransaction() and a 
startStatement() (and the corresponding endStatement() and 
commitTransaction/rollbackTransaction()).

But I don't think this is absolutely correct:

* DML update statements require both beginTransaction() and a startStatement().
* DML read-only statements only require a beginTransaction() call because a 
SELECT does not need a statement level transaction (because they cannot be 
rolled back).
* And DDL statements only require a startStatement() because it is up to the 
engine to decide if this can be done within a transaction or not.

For example if beginTransaction() is called before startStatement() then 
engines that do not handle DDL in transactions should return an error. In 
addition, if a engine does atomic DDL, then it can use the startStatement() to 
begin a transaction.

With these calls the engine will have most of the information it needs.

There is some additional information which should be provided when a cursor is 
used:

For example, PBXT needs to know:

- which columns will be accessed (an optimization so that not all need to be 
loaded),
- whether rows retrieved will be updated or deleted,
- if the rows need to be locked (as in SELECT FOR UPDATE).

Toru, what's your opinion?

-jay

And this is how the engine would handle "ADD INDEX", or "ENCRYPT TABLE":
startStatement("ENCRYPT TABLE", "t1") --> return: use custom method
doTableOperation("ENCRYPT TABLE", "t1")
endStatement()
The engine can write table operations to its transaction log, and in this way 
it could ensure that the entire ALTER TABLE statement is atomic.
On Dec 7, 2009, at 4:10 PM, Jay Pipes wrote:

Paul McCullagh wrote:

Hi Toru,
On Dec 7, 2009, at 3:31 AM, Toru Maesaka wrote:

Great to hear another use-case where knowing a statement type in
advance is useful :)

Yes, generally I need to know the following:
- If I have a update type statement (i.e. whether the statement modifies rows).
- Whether I need a table lock (examples: ALTER TABLE, TRUNCATE, CHECK).

But, Paul, doesn't this depend on the engine itself?  I mean, some
engines can do (some types of) ALTER TABLE without taking a table lock.
So, is this request really for whether the kernel thinks a table-level
lock is necessary, or is it really just for a descriptor of the
statement type?

And, if it really does just boil down to the statement type, then how do
we deal with the reality that Brian speaks about -- that statement type
will be pluggable, and how do we deal with future statement types for
pluggable engines?

Is a reasonable solution to pass to engines a sort of "statement
traits"?  So, instead of passing ALTER_TABLE, CREATE_TABLE, UPDATE,
DELETE, etc, we instead pass a std::bitset<> (or uint64_t for C folks)
containing traits of the statement such as:

MODIFIES_DATA
MODIFIES_DEFINITION
etc, etc

And then to deal with transaction locking concerns, just add a method to Cursor:

void Cursor::setTransactionIsolationLevel(enum enum_tx_isolation);

Cheers!

Jay

- If we have a SELECT FOR UPDATE.

I was talking to Toru about this, and another possibility is that we have statements 
declare a needed "lock type" that any plugin could then query. I outlined the 
solution for Toru, but I don't know if he has written the patch yet :)

I've taken notes from our discussion the other day. I'm planning on
working on it when I finish testing through my current progress of
BlitzDB.

Great! :)

For now, I'm happy with Jay's advise of using
current_session().

Cheers,
Toru

On Sat, Dec 5, 2009 at 5:59 AM, Brian Aker <[email protected]> wrote:

Hi!

On Dec 4, 2009, at 3:12 AM, Paul McCullagh wrote:

If we have a startStatement() call, then it could be used in place of 
beginAlter(), assuming we can determine the statement type, and the tables 
involved.

The problem with relying on statement type is that at some point statement type 
will be pluggable... which means you would constantly need to update your 
engine for new statements.

Yuck!

I was talking to Toru about this, and another possibility is that we have statements 
declare a needed "lock type" that any plugin could then query. I outlined the 
solution for Toru, but I don't know if he has written the patch yet :)

Then, when a handle is returned to the pool it is deleted, instead of adding it 
back to the pool.

BTW very soon engines will own their Cursor objects and will be free to reuse 
them.

The locking thread waits until all handles are returned and deleted before it 
can proceed. The lock on the pool then prevents a new table handle from being 
created while the locking thread is busy.
Either way, it would be good if Drizzle closes all handlers/cursors before a 
table is deleted or renamed.

I would say that long term this will be optional, based on what the engine 
requires.

OK, this make things a lot simpler! Indeed, if we don't need to support LOCK 
TABLE then external_lock() can be removed altogether.

Tried removing the external_lock() right now and seeing if any issues pop up?

Cheers,
   -Brian

--
Paul McCullagh
PrimeBase Technologies
www.primebase.org
www.blobstreaming.org
pbxt.blogspot.com

--
Paul McCullagh
PrimeBase Technologies
www.primebase.org
www.blobstreaming.org
pbxt.blogspot.com


--
Paul McCullagh
PrimeBase Technologies
www.primebase.org
www.blobstreaming.org
pbxt.blogspot.com


_______________________________________________
Mailing list: https://launchpad.net/~drizzle-discuss
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~drizzle-discuss
More help   : https://help.launchpad.net/ListHelp

Re: [Drizzle-discuss] Improving the Engine API

Reply via email to