Lobs in database with two files

Dario Fassi Tue, 21 Dec 2010 11:03:31 -0800

Hi,
This is a new thread from "File extension independent H2 format", since 
discussion has led to other subject:


What think you of using only 2 files, one for all normal columns and other 
specialized for long data types(LOBS).


El 20/12/10 17:30, Thomas Mueller escribió:

>> >> would a single storage file slow down H2 engine from looking
>> >> for and getting or writing data since
> > No. The only problem (I know) is what I have already described.
I doubt that this could be real. In a database with many lobs columns and rows 
the size of this single file can easily grow to disadvantageous levels.
A single lob field can have the size of several full tables or even exceeding 
the size of the rest of the database.

Fragmentation at file system (OS) level will have much more impact on large 
files, caching (at OS level) will be less effective, read-ahead capabilities 
will be less effective and finally IO load will increase inevitably.

It's easy to measure the degradation of the performance of a database as the 
data volume is significantly increased. 
I mean, if a db without lobs have 1 GB size and with lobs goes over 10 GB, 
would be very optimistic to think that the overall performance
will not change. As a case imagine defragment or compact a file of that size.
In a two files scenario, we would have a main file of 1 GB with almost all data 
+ indexes, and the lobs file of 9 GB with lobs only. ( Not so bad as a file per 
lob and not so big as all in one file).


>> >> What think you of using only 2 files, one for all normal columns and 
>> >> other specialized for long data types(LOBS).
> > I thought about this, but I would try to avoid it if possible. What
> > would be the advantage?
I can think in all stated above and more:

1) Main file will concentrate almost all indexes and data (except lobs) and 
references to lobs file as column values for lobs columns in the main file.

2) Lobs file can have a different fileStore (much more simple and specific) 
organized in variable length extents to take advantage of sequential nature of 
it's contents. 

Such a fileStore only need an avail-list and an index of references to 
pointers; to be used as column value in the main file.
Like old xBase .DBT files that use a simple and very effective format or .tar 
files format that was designed for sequential access devices (or streaming in 
Java parlance).
So a locator can be implemented easily (at file level) as the Lob Reference 
pointer + locator offset.

For extents contents compacting (if needed) can be used a stream oriented 
method like deflate or gzip without harm streaming .
Each extents can have a header with a tag-marker,  length , checksum, etc. ; to 
make broken file recovery easier.


>> >> that can facilitate locator's implementation too
> > How?
Is explained above, but again.

If lob's fileStore is organized as a sequence of variable length extents with 
and index of pointers and available (or deleted) extents ;
a locator can be implemented easily at file level as the Lob Reference pointer 
+ the locator offset.

Streaming access to lob's contents will be simplified and benefited too.

regards,
Dario.



-- 
You received this message because you are subscribed to the Google Groups "H2 
Database" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/h2-database?hl=en.

Lobs in database with two files

Reply via email to