Heh... Or a possibly simpler explanation: is 400 bytes the minimum overhead of a row in h2, or related in an unfortunately way to a block size? I just found that the 2x problem goes away when I change it to 4000.
On Friday, June 20, 2014 9:20:04 PM UTC-7, Brian Craft wrote: > > Following up on the large db files, I ran a few tests, loading into this > table: > > CREATE TABLE IF NOT EXISTS `scores` (`id` INT NOT NULL PRIMARY > KEY,`scores` VARBINARY(400) NOT NULL) > > I varied the settings of LOG, and UNDO_LOG, used one csvread or a sequence > of INSERT statements, inside a transaction, or without, and a variety of > sizes. > > In all cases the h2 data file is over 2x the size of the input data. 10M > of binary data partitioned and stored in the varbinary scores field will > result in 20M and change on disk. This is a huge increase. > > Additionally, with LOG=1 (rather than LOG=0), another 100% increase in > size will occur, though I only saw this with larger data sets (say 100M). > Some examples: > > csvread of 1G with LOG=1 becomes 3.6G on disk, with or without a > transaction (I'm guessing they are equivalent). > csvread of 1G with LOG=0 becomes 2.5G on disk,. > csvread of 100M with LOG=0 becomes 250M on disk, etc. > inserts of 100M with LOG=0 becomes 250M on disk > > The 2x size increase seems like a hard limit, and I'm wondering if this > could be an encoding problem. Does h2 store varbinary as binary, or is it > converting it to hex, or something? > > > > -- You received this message because you are subscribed to the Google Groups "H2 Database" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/h2-database. For more options, visit https://groups.google.com/d/optout.
