Brian,

It's ok, it's good (in a way) to know that other people are having the
same problems.

Julian


On Oct 26, 2:11 pm, brianstarke <[EMAIL PROTECTED]> wrote:
> Sorry Julian, I didn't mean to steal your thunder :)  I think we might
> be experiencing separate issues and perhaps I should split this
> question out into another topic.
>
> On Oct 26, 4:44 pm, brianstarke <[EMAIL PROTECTED]> wrote:
>
> > Sorry, I should have been more clear... :)
>
> > The shards are irrelevant to this discussion I suppose, but just for a
> > brief background;
>
> > I have 64 H2 instances over 8 servers.  These servers are running
> > Hadoop and data that the Hadoop jobs are extracting is being
> > distributed across them evenly.  We tried many other options for
> > achieving this task, but doing it this way was just too screaming fast
> > to pass up.
>
> > The memory has thus far not been an issue on any of these databases,
> > each gets 3G of mem and has even come close to using more than 1G
> > while we're pounding data into them.  The servers have TB's of storage
> > on them, so that's not an issue either.
>
> > What tends to happen is that this setup is insanely fast when the
> > databases and tables are all empty (naturally), but as the tables get
> > beyond 500,000 rows per shard, the performance gets brutally slow.
> > The test I've been using to check this performance is pretty dead
> > simple, where I basically read in one batch of our data - then use
> > CSVWRITE to dump the table in a csv file.  After that I just keep
> > INSERT INTO with CSVREAD to add more data.  the first batch insert
> > takes less than a sec but by the time I'm at 5 million records (adding
> > 1 million at a clip) the inserts take close to 30 minutes.
>
> > Now the solution I've been working on to get around this is to keep
> > the tables small (already somewhat done via the sharding) and to dump
> > the tables out into CSV's every hour or so and push them into a mysql
> > (or postgres) instance outside the cluster to be imported by a
> > database that is slower, yes, but can handle larger table sizes more
> > consistently.  MySql would take 10 minutes to do what H2 does in 3
> > minutes when the tables are empty.  When the tables get over 5 million
> > records though.. MySql still takes 10 minutes where H2 takes 30.  It
> > seems like this combo (although annoying) plays to the strengths of
> > each db and is just crazy enough to work.
>
> > After using H2 for the bulk of this project, I've been spoiled by it's
> > ease of use, elegance and speed.  I'd love to use H2 the whole way
> > through and don't want to go back to MySql or Postgres - so I'd love
> > to know if anyone has suggestions for dealing with massive data sets
> > in H2 without degrading the performance to the point of making it
> > useless.
>
> > Again, thanks for any help and thanks for reading.  I can provide more
> > technical data as needed, of course, but I'm trying to take the
> > specifics out of the group discussion to protect my employers
> > interests.
>
> > Rock on,
>
> > Brian
>
> > On Oct 26, 12:19 pm, "Thomas Mueller" <[EMAIL PROTECTED]>
> > wrote:
>
> > > Hi,
>
> > > > Once my db shards grow beyond 500,000 records
>
> > > Sorry, what is a 'db shard'?
>
> > > > Inserting 5 million records over 64 clean db shards happens in
> > > > 3 minutes.
>
> > > So you mean 64 databases?
>
> > > >  After those shards get over 500,000 records - the whole
> > > > insert takes 29 minutes.
>
> > > Maybe the system runs out of physical memory?
>
> > > Regards,
> > > Thomas
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "H2 
Database" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/h2-database?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to