Re: [sqlite] Low Level API for SQLite3

drh Tue, 09 May 2006 07:54:12 -0700

Ralf Junker <[EMAIL PROTECTED]> wrote:
> Is there more information available on saving and loading prepared statements 
> and omitting the parser from the library?
>


OVERVIEW

The SQLite Stored Statement Extension consists of twelve
source code files that are added to a standard public-domain SQLite
release.  The added files are proprietary and are used under license.
But the bulk of the code is from SQLite which continues in the public
domain.

The Stored Statement Extension (hereafter called SSE) provides the
programmer with two new APIs:

   int sqlite3_serialize(sqlite3_stmt*, void**, int*);
   int sqlite3_deserialize(sqlite3*, void*, int, sqlite3_stmt**);

The first routine takes an SQL statement that was generated by
sqlite3_prepare() and converts it into a form that can be stored
on disk or compiled into a program.  The second routine does the
reverse; it takes the serialization of a statement and converts it
back into a working SQL statement that can be used just like any
other statement created by sqlite3_prepare().

In addition to the new APIs, the SSE comes with instructions and
example Makefiles that allow the developer to compile SQLite so
that the parser and code generator are omitted.  Such a build will
typically be much smaller than a standard SQLite build - less than
half size of a standard SQLite build.  The sqlite3_prepare()
API will be unavailable in a build of SQLite that omits the parser,
so the sqlite3_deserialize() API must be used instead.

The point of the SSE is to allow manufacturers of low-power
embedded devices to make use of the SQLite backend without having
to host the SQL parser and code generator on the device.  The
parser and code generator are on a developers desktop where
they generate serialized SQL statements which are then loaded
into the embedded device and executed by the backend.  This
provides the developer with the full power of a transactional
SQL database engine but with a much smaller code footprint.

MEMORY AND DISK USAGE

The size of the SQLite library without the parser component will be
between 65 and 90KiB depending on what features are included in the
build.  (Example:  The built-in date/time functions, if included, add
about 8KiB to the library footprint.)  An additional 50 to 100KiB of
stack and heap space are require to run common queries.  More heap
space may be required for complex queries.  There is often a trade-off
between memory utilization and performance;  the database will
typically run faster if more heap memory is available.

The size of a serialized statement is typically about 5 times the size
of the original SQL text, though this ratio can vary significantly
depending on the database schema.  For example, the serialization of
"SELECT * FROM table1" will be much larger if table1 contains 100
columns than if it contains only 1 column.  In one typical application,
the size of 81 serialized statements ranged from 25 to 1000 bytes
with an average size of 275 bytes.  We expect other applications to
show similar results.  These figures assume that the  binary encoding
for serialized statements is selected.  When an ASCII text encoding is
used, the sizes grow by about 25%.

LIMITATIONS

There are limitations to this approach.  Compiled statements in
SQLite depend heavily on details of how specific tables are arranged
in the database file.  Those details change whenever the database
schema changes (due to a CREATE or DROP or ALTER statement) or when
the VACUUM command is run.  Thus, statements that have been created
using sqlite3_serialize() will only work on a database that has
not been modified by CREATE, DROP, ALTER TABLE, or VACUUM since
the statement was created.  If new tables or indices are added
to a database or if the database is vacuumed, then all statements
must be recompiled.

The sqlite3_serialize() routine must be used on a freshly compiled
statement.  sqlite3_serialize() will not work after a statement
has been passed to sqlite3_step().

The exact same version of SQLite must be used to serialize and
deserialize statements.  It will not work to serialize statements
from one version of SQLite and then attempt to deserialize and
run them using a different version.

The sqlite3_deserialize() routine does little to no error checking
of its input.  If the input is malformed, the result could be a
serious software malfunction such as a segmentation fault, an
infinite loop, or database corruption.  Developers should take care
to verify that input to sqlite3_deserialize() is exactly the same
as the output from a prior sqlite3_serialize().

ADDITIONAL FEATURES

The sqlite3_serialize() routine can generate either a binary or
an ASCII text serialization of a statement.  This choice must be
made at compile-time.   The default is to use a binary encoding.
If the -DSQLITE_SSE_ASCII option is added to the compiler command-
line then an ASCII encoding is used.  The advantage of an ASCII text
serialization is that it is easier to insert into a compiled program
The encoding is just a string that can be enclosed in "...".  The
disadvantage is that the ASCII text serialization is larger (by 
about 25%) than the equivalent binary serialization.

When SQLite with the SSE is compiled on a desktop (when it has a
parser) a special pragma is provided that allows the user to create
a table as follows:

   CREATE TABLE sqlite_statement(
      id INTEGER PRIMARY KEY,
      sql TEXT,
      serial BLOB
   );

A pragma is required to create this table because SQLite normally
reserves the sqlite_* table namespace for itself.  The special
pragma is as follows:

   PRAGMA create_sqlite_statement_table;

A new API is available that will automatically extract and deserialize
an SQL statement from the sqlite_statement table given its id number:

   int sqlite3_fetch_statement(sqlite3*, int id, sqlite3_stmt**);

The idea here is that the SQL statements needed by an application can
be inserted as plain text into the sqlite_statement table.   For
example:

   INSERT INTO sqlite_statement(sql) VALUES('SELECT * FROM table1');

After many such statements are inserted, they can all be serialized
as follows:

   UPDATE sqlite_statement SET serial = sqlite_serialize(sql,id);

Then the complete database can be moved from the development platform
over to the embedded device and the embedded device can use the
sqlite3_fetch_statement() API to extract the statements it needs to
execute.

The original design was to have the VACUUM command automatically run
the above UPDATE on the sqlite_statement table.  But that design was
changed when we realized that manually running the UPDATE provides
more flexibility.  For example, to reduce the size of the sqlite_statement
table, you might decide to delete the original SQL text.  To do so,
create a copy of the database that is to be transferred to the embedded
device (so that the original SQL is not deleted from your development
database) and use this command:

   UPDATE sqlite_statement SET serial=sqlite_serialize(sql), sql=NULL;

The software development process is simplified by using this approach
for storing serialized statements.  Because the serialized queries
are stored in the database itself, a database schema can be modified 
and then sent over to the embedded device and all of the queries will 
still work without having to modify and recompile the software on the 
embedded device.  Also, the administrative burden of tracking which
SQL statements correspond to which serializations is greatly reduced.
And, RAM usage might be reduced since serializations are stored on
disk until needed rather than being compiled into the program and
loading into RAM with the program.  (This last point assumes the
embedded device does not implement demand paging - a reasonable
assumption.)

It is believed that the sqlite_statement table will be very useful
to developers.  But its use is completely optional.  If omitted, there
is no library footprint penalty on the embedded side.

MEMORY ONLY DATABASES

For embedded devices that never lose power and do not require that
two or more processes have access to the same database, it is possible 
to store the complete database in memory.  If SQLite is compiled with
the -DSQLITE_OMIT_DISKIO command-line option, then the code used
for writing the database to disk (or flash) is omitted, thus reducing
the library footprint.  (Estimated savings: 9KiB depending on CPU
architecture.)  In addition, most database operations will be much
faster if information does not need to be moved in and out
of Flash memory or disk.  

When this option is used, a new API will be available that can be used
to initialize the in-memory database from a database image contained 
in either RAM or Flash or loaded over a socket.

   int sqlite3_raw_write(sqlite3*, int nByte, int iOffset, 
                         unsigned const char*);

After the database connection is opened using sqlite3_open() the content
of the in-memory database is initialized by one or more calls to the
sqlite3_raw_write() routine above.  Raw database bytes, such as might
be taken from the disk image of a database file prepared on a 
workstation, are added to the in-memory database image.  The second
parameter is the number of bytes of data to be added.  The third
parameter is the index of the first byte to be written.  The fourth
parameter is a pointer to the beginning of the data.  This routine
can be called multiple times if the original database image is not
contiguous. 

This routine makes a copy of the original database image in internal
data structures.  Memory to hold the copy is obtained from malloc().
After this routine has been used to initialize the internal data 
structures, the memory space used to hold the original database image
can be released and reused for other purposes, if desired.

The sqlite_raw_write() routine is efficient even if the nByte parameter
is small.  This allows the in-memory database to be initialized by
a data transfer over a network connection.  Each packet received over
the network can be moving into the database as it arrives at the
device, minimizing the need to buffer space.

AVAILABILITY

The SSE license is a one-time fee that allows unlimited redistribution
in compiled binary form when linked into the licensees embedded
application.  Full source code is provided.  The standard license
package includes 90 days of technical support.  Additional technical
support can be acquired separately.

Re: [sqlite] Low Level API for SQLite3

Reply via email to