Are you using some commands like "create table mytable as select * from 
csvread('myfile.csv')" ?

If so, I've few possible pieces of advice:
- create a schema first, with the right types expected for attributes. 
otherwise each attribute will be casted as text with 2Gb max allocation.
- insert like this example:
  INSERT INTO LANG_CHINESE (ISO ,NAME ) SELECT * FROM 
CSVREAD('CHINESE.CSV',NULL,'UTF8') ;
- do not create indexes first, create indexes after the csv import is 
completed.

Can your data be shared?

On Friday, May 29, 2015 at 7:45:53 PM UTC+2, John Smith wrote:
>
> I have several csv files I'm loading up into an h2 instance (running the 
> beta version of h2, ver 1.4.187).
>
> Total bytes on disk for csv files is approx 4 GB.
>
> After importing this in h2, adding a few indexes, the size of my h2 mv.db 
> database file is 3.6 gigs. Same csv files loaded up into an h2 "mem" 
> database, all in ram, windows server shows usage of 85 gigs. The memory 
> database just explodes in size compared to the file database, all else 
> between the databases remains the same (tables + indexes etc). 
>
> Does that make sense? The huge difference in size there? Even accounting 
> for some of that 85 gigs being used as a cache of some sort that might also 
> be present with the file database (I forgot to test how much ram the h2 
> process consumed when using file database), why is there such a huge 
> increase in ram usage? My csv files themselves are only 4 gigs, I add one 
> or two indexes on each table (there are two tables), I don't see how 4 gb 
> of csv translates to 85 gigs once loaded in ram?
>
> Second thing is speed. I moved my file database to a ram / memory disk. 
> Then loaded up h2 and executed a query. It took 8.4 minutes. It was a 
> simple query, no joins, but used one orderby on an unindexed column. The 
> same query using the h2 mem database returned in 15 seconds; same thing, no 
> index on the orderby. I get that I could add an index to cut down the query 
> time, but that's not the point; I wanted to test how much difference in 
> performance there is b/n the native mem mode, and the h2 file database 
> stored on a ram disk.
>
> So to sum up: 
> 1: why does h2 use so much more memory in "mem" mode vs the file size on 
> disk, for the same amount of data? 
> 2: why would the mem database perform so much quicker than a file database 
> stored in ram? 
> 3: if I am required to use the "mem" mode for the performance that we 
> need, is there any way I can "defrag" the database while it's in ram? After 
> several drops, deletes, updates etc., the mem based database continues to 
> grow and I can't shrink it using "shutdown defrag" like I can with a file 
> based database.
>
> Thanks in advance for any thoughts on all this.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups "H2 
Database" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/h2-database.
For more options, visit https://groups.google.com/d/optout.

Reply via email to