Re: Performance: DROP TABLE very slow on large tables

jarradk Sat, 07 Jul 2012 06:29:51 -0700

To make the serial functionality parallel may not solve the core problem.

My database file is 13G and I can copy it to another file on the same HDD in 
about 10 minutes.
i.e. all the data can be read and re-written to the same (constrained) device.
Assume that additional processing increases this by one order of magnitude 
higher.
That would take 100 minutes.
Using a parallel strategy might reduce this.
But the actual time was 450 minutes (to drop one table).

Another way of saying this is that 10 minutes of the 450 minutes is the 
streaming of data on the disk.
So the parallel approach might reduce that 10 minutes.
But the remaining 440 minutes might still take 440 minutes.
So the parallel approach might drop one table in 445 minutes.

From: Noel Grandin [via H2 Database] 
Sent: Friday, July 06, 2012 9:25 AM
To: jarradk 
Subject: Re: Performance: DROP TABLE very slow on large tables

Ah, yes, now I remember.

So there are a variety of ways to tackle this.Â  Of the top of my head I have:

(1) Sacrifice some recovery resistance by not overwriting the headers of the 
pages. 

(2) Change the on-disk format to put headers together, making it quicker to 
overwrite them. But that would sacrifice performance for normal operations 
because it would increase the number of IO's performed.

(3) Do a hybrid approach where we don't immediately overwrite the page headers, 
but use a lower priority background thread to gradually overwrite the page 
headers.

(4) Use asynchronous IO to speed up the overwriting of the page headers - at 
the moment we perform the operation sequentially, which is hideously slow on 
modern hardware, because it waits for each IO to complete before performing the 
next one.

The option that most interests me is option (4). We could extend the FileStore 
interface to have a method
Â Â Â  public void writeToMultiple(byte[] data, int[] offsets, int len)
which would use asynchronous IO to push a bunch of writes to the disk.

Unfortunately, asynch disk IO (java.nio.channels.AsynchronousFileChannel) is 
only available for Java7. 
So we have 2 options - emulate it using memory mapped files and the 
FileChannel#force() method, or simply only enable it when the code is running 
under Java7.

On 2012-07-05 20:37, Steve McLeod wrote:

  I looked back at my work on this some months ago, and actually it was with 
TRUNCATE TABLE that I did my investigation. You can read the discussion about 
this here: 
  https://groups.google.com/forum/?fromgroups#!topic/h2-database/jUqGLVL1FeE

  I posted there the following:

  This line is the one consuming the time:

  Â  Â  Â  Â  Â  Â  Â  Â  file.readFully(test, 0, 16);Â 

  which is org.h2.store.PageStore.java: line 451 in the current SVN trunk.

  On Thursday, 5 July 2012 at 8:19 PM, Noel Grandin wrote:

    On 2012-07-05 12:25, Steve McLeod wrote:
      I've also experienced this slowness when dropping a large table. I 
      spent a considerable amount of time with the H2 source code trying to 
      find a way to speed things up, but alas it turns out not to be an easy 
      task with the current data store.
    Hmm, you're right, that code path is pretty deep and winding.

    starts here
    DropTable#executeDrop()
    which calls
    Database#removeSchemaObject(...)
    which calls
    DbObject#removeChildrenAndResources(Session)
    which means it's actually calling
    RegularTable#removeChildrenAndResources(Session)
    which calls
    Index#remove(Session)
    which means it's actually calling
    PageDataIndex#remove(Session)
    which calls
    PageDataIndex#removeAllRows()
    which calls
    PageData#freeRecursive()

    Can you run a profiler across the code and see where in this call stack 
    it is spending the bulk of it's time?

-- 
You received this message because you are subscribed to the Google Groups "H2 
Database" group.
To post to this group, send email to [hidden email].
To unsubscribe from this group, send email to [hidden email].
For more options, visit this group at 
http://groups.google.com/group/h2-database?hl=en.

--------------------------------------------------------------------------------

If you reply to this email, your message will be added to the discussion below:
http://h2-database.66688.n3.nabble.com/Performance-DROP-TABLE-very-slow-on-large-tables-tp2885986p4024564.html

To unsubscribe from Performance: DROP TABLE very slow on large tables, click 
here.
NAML

--
View this message in context: 
http://h2-database.66688.n3.nabble.com/Performance-DROP-TABLE-very-slow-on-large-tables-tp2885986p4024573.html
Sent from the H2 Database mailing list archive at Nabble.com.

-- 
You received this message because you are subscribed to the Google Groups "H2 
Database" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/h2-database?hl=en.

Re: Performance: DROP TABLE very slow on large tables

Reply via email to