[h2] Re: some data size increase tests

Brian Craft Sat, 21 Jun 2014 05:51:13 -0700

Heh... Or a possibly simpler explanation: is 400 bytes the minimum overhead 
of a row in h2, or related in an unfortunately way to a block size? I just 
found that the 2x problem goes away when I change it to 4000.


On Friday, June 20, 2014 9:20:04 PM UTC-7, Brian Craft wrote:
>
> Following up on the large db files, I ran a few tests, loading into this 
> table:
>
> CREATE TABLE IF NOT EXISTS `scores` (`id` INT NOT NULL PRIMARY 
> KEY,`scores` VARBINARY(400) NOT NULL)
>
> I varied the settings of LOG, and UNDO_LOG, used one csvread or a sequence 
> of INSERT statements, inside a transaction, or without, and a variety of 
> sizes.
>
> In all cases the h2 data file is over 2x the size of the input data. 10M 
> of binary data partitioned and stored in the varbinary scores field will 
> result in 20M and change on disk. This is a huge increase.
>
> Additionally, with LOG=1 (rather than LOG=0), another 100% increase in 
> size will occur, though I only saw this with larger data sets (say 100M). 
> Some examples:
>
> csvread of 1G with LOG=1 becomes 3.6G on disk, with or without a 
> transaction (I'm guessing they are equivalent).
> csvread of 1G with LOG=0 becomes 2.5G on disk,.
> csvread of 100M with LOG=0 becomes 250M on disk, etc.
> inserts of 100M with LOG=0 becomes 250M on disk
>
> The 2x size increase seems like a hard limit, and I'm wondering if this 
> could be an encoding problem. Does h2 store varbinary as binary, or is it 
> converting it to hex, or something?
>
>
>
>

-- 
You received this message because you are subscribed to the Google Groups "H2 
Database" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/h2-database.
For more options, visit https://groups.google.com/d/optout.

[h2] Re: some data size increase tests

Reply via email to