Re: [dba-dev] Base Performance

Marc Santhoff Tue, 18 Sep 2007 11:58:27 -0700

Am Dienstag, den 18.09.2007, 09:23 +0200 schrieb Frank Schönheit - Sun
Microsystems Germany:
> Hi Marc,
> 
> >> Unfortunately not. This would be the only change which *really* allows
> >> to address a number of performance issues with the embedded HSQLDB.
> >> Amongst others, closing data views or forms becomes unacceptably slow
> >> (IMO) if the .odb exceeds a certain (relatively small) size limit. Also,
> >> opening the connection becomes slower as the database and thus the .odb
> >> grows. The only change to overcome this would be the single-file
> >> backend, but there has been no progress at this.
> > 
> > Will a single file make such a big difference? And why?
> 
> Because with the ZIP file architecture, every commit/write to the ZIP
> file (the .odb) requires a complete rewriting of the complete package.
> Technically, this is "solved" (not really) by working on a copy of all
> the streams in the ZIP package, and only re-packinging them when the
> document as a whole is finally saved.
> 
> That is the reason for some oddnesses: For instance, if a form is saved,
> then the changes you did to the form are saved to the copy of the form's
> stream. Only if you then save the database document, the copy is
> re-merged into the .odb file.


Now I understand, and because zip is a stream compression it has to be
as it is.

> This approach was dictated by the fact that on the medium term, the .odb
> format should be standardized at OASIS as well, and this means doing as
> the other applications/formats do.

And since ODF dictates this approach it is implicitly excluding high
performance on bigger data volumes.

An idea here could be to re-package the .odb using the database part as
a single item with compression -0, but I think zip doesn't support this.

But using ODF for the database part is debatable in itself, there is no
application that could reuse it currently - and reuse is the goal of
ODF.

> Also, this would automatically solve the problem of data changes not
> surviving a crash: Currently, when you enter data in say the table data
> view, this (by the HSQL engine) is immediately (well, with a
> configurable delay) written into the underlying files. This is how every
> reasonable database engine behaves - it means if you pull the plug just
> after changing the data, it will most probably still be there the next
> time you look at it (again, not counting for possible write caches of
> the operating system).

Very valuable feature ...

> With a single-file back-end (which, when I say it, always implies a file
> with random access to it, other back-end file formats are useless for a
> DB engine), this would change, too.
> 
> > I could easily think of owering the workload when serializing the
> > database to disc by having some sort of background task preparing the
> > physical save by e.g. building up the DOM-model of the data or the like.
> 
> I'd suppose this is overkill, and will get into performance problems a
> little bit later, but soon enough. Still, the problem, imposed by the
> ZIP format, is that for the change of a single byte in the file (say you
> changed a single letter in a table row), the complete file has to be
> re-packaged and re-written. This bottle neck can IMO only be removed
> with a change of the file format, away from ZIP, forward to a
> random-access format.

So having a random access file backend would be the way to go. Do you
have something in mind (although I think this has to be solved in the
HSQL source)?

<shouting "Jehova mode>
Other databases using binary files having a jdbc driver may fit this
requirement, too. Firebird would be a candidate.
</shouting "Jehova mode>

Marc


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [dba-dev] Base Performance

Reply via email to