It shouldn't be using that much memory.
If you could create a standalone test-case that would be perfect.
If not perhaps try examining it with a profiler and posting what it finds
here
On Tue, 02 Jun 2015 at 21:37, John Smith <[email protected]> wrote:
> Thanks for your comments! I'm using a custom script which reads the first
> 20,000 lines of a csv file, then generates the h2 import command
> dynamically from that. So it's not a straight "create table X as select *
> from Y", but rather it includes proper column types for every column it
> detects.
>
> The string / varchar data could possibly be handled better, right now it
> defaults to varchar(5000); still better than the Integer.max value I
> believe H2 has by default, but not quite as small as it could be. We have
> widely variable strings all over the place though so it's hard to predict a
> max value for the string columns. Sometimes the first csv file will import,
> but then one of the remaining (of hundreds) csv files throws up on the
> "INSERT" command (also generated at the same time with the same schema as
> the CREATE TABLE) so we prefer to keep the varchars a little bigger to be
> safe.
>
> We do add the indexes after the data is all inserted though, so at least
> that looks good.
>
> But I guess I'm confused as to why the ram usage is so much higher than
> the file on disk is, when compared to file based databases. Wouldn't a 4
> gig .mv.db file use roughly the same amount of ram when the same data is
> loaded in a "mem" database? Even then, we might be able to work around the
> huge memory usage, as long as we can regain the space in ram again when we
> drop a table and reload it from updated csv. Right now the "mem" mode just
> continues to grow and grow until eventually h2 stops responding (cpu usage
> increases but query never completes; I'm assuming it just runs out of
> available memory)
>
> Unfortunately the data is not allowed to leave the building. I might be
> able to create a small java program which generates a large database; to
> compare "mem" mode vs "file" mode, but not sure how helpful that would be.
>
>
> On Tuesday, June 2, 2015 at 2:19:58 AM UTC-4, Christian MICHON wrote:
>>
>> Are you using some commands like "create table mytable as select * from
>> csvread('myfile.csv')" ?
>>
>> If so, I've few possible pieces of advice:
>> - create a schema first, with the right types expected for attributes.
>> otherwise each attribute will be casted as text with 2Gb max allocation.
>> - insert like this example:
>> INSERT INTO LANG_CHINESE (ISO ,NAME ) SELECT * FROM
>> CSVREAD('CHINESE.CSV',NULL,'UTF8') ;
>> - do not create indexes first, create indexes after the csv import is
>> completed.
>>
>> Can your data be shared?
>>
>> On Friday, May 29, 2015 at 7:45:53 PM UTC+2, John Smith wrote:
>>>
>>> I have several csv files I'm loading up into an h2 instance (running the
>>> beta version of h2, ver 1.4.187).
>>>
>>> Total bytes on disk for csv files is approx 4 GB.
>>>
>>> After importing this in h2, adding a few indexes, the size of my h2
>>> mv.db database file is 3.6 gigs. Same csv files loaded up into an h2 "mem"
>>> database, all in ram, windows server shows usage of 85 gigs. The memory
>>> database just explodes in size compared to the file database, all else
>>> between the databases remains the same (tables + indexes etc).
>>>
>>> Does that make sense? The huge difference in size there? Even accounting
>>> for some of that 85 gigs being used as a cache of some sort that might also
>>> be present with the file database (I forgot to test how much ram the h2
>>> process consumed when using file database), why is there such a huge
>>> increase in ram usage? My csv files themselves are only 4 gigs, I add one
>>> or two indexes on each table (there are two tables), I don't see how 4 gb
>>> of csv translates to 85 gigs once loaded in ram?
>>>
>>> Second thing is speed. I moved my file database to a ram / memory disk.
>>> Then loaded up h2 and executed a query. It took 8.4 minutes. It was a
>>> simple query, no joins, but used one orderby on an unindexed column. The
>>> same query using the h2 mem database returned in 15 seconds; same thing, no
>>> index on the orderby. I get that I could add an index to cut down the query
>>> time, but that's not the point; I wanted to test how much difference in
>>> performance there is b/n the native mem mode, and the h2 file database
>>> stored on a ram disk.
>>>
>>> So to sum up:
>>> 1: why does h2 use so much more memory in "mem" mode vs the file size on
>>> disk, for the same amount of data?
>>> 2: why would the mem database perform so much quicker than a file
>>> database stored in ram?
>>> 3: if I am required to use the "mem" mode for the performance that we
>>> need, is there any way I can "defrag" the database while it's in ram? After
>>> several drops, deletes, updates etc., the mem based database continues to
>>> grow and I can't shrink it using "shutdown defrag" like I can with a file
>>> based database.
>>>
>>> Thanks in advance for any thoughts on all this.
>>>
>>>
>>> --
> You received this message because you are subscribed to the Google Groups
> "H2 Database" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/h2-database.
> For more options, visit https://groups.google.com/d/optout.
>
--
You received this message because you are subscribed to the Google Groups "H2
Database" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/h2-database.
For more options, visit https://groups.google.com/d/optout.