Re: Optimisation suggestion: Don't calculate selectivity for CLOB and BLOB columns

Steve McLeod Fri, 11 Nov 2011 05:14:03 -0800

Hi again,

I've done some further profiling, and H2 seems to calculate the hash
for CLOB's on every ANALYZE by delegating to java.lang.String's
hashCode() method. So the CLOB data does seem to be loaded into
memory, as far as I can tell.


The longer the data in the CLOB, the higher the benefit from not
calculating selectivity for CLOB's. For CLOB's with 2000 characters,
you achieve the 20x speed increase that I mentioned earlier. In my
tests, ANALYZE runs in roughly 15 ms instead of 300 ms.

Ironically, if the CLOB data is longer than 4096 bytes, then String's
hashCode() is not called, as the hash is then just a function of data
length, and everything runs much faster. Take a look at ValueLobDb's
hashCode() method to see why.

I hope this info is useful.


On Nov 11, 11:01 am, Steve McLeod <[email protected]> wrote:
> Hi Thomas,
>
> Ah, your patch is nicer than my patch! Nice...
>
> The auto-analyze performance problems I observe happen with a large
> database (~5GB), after a couple of hours of running, when the maximum
> available JVM heap  space is close to exhausted. This is indeed not an
> everyday case... :)
>
> Regards,
>
> Steve
> ---------------------------------------------------
> Steve McLeod
> Founder, Poker Copilothttp://www.pokercopilot.com
>
> On Nov 10, 6:34 am, Thomas Mueller <[email protected]>
> wrote:
>
>
>
>
>
>
>
> > Hi,
>
> > If I understand correctly, H2 doesn't allow indexes on CLOB or BLOB
>
> > > columns.
>
> > Yes.
>
> > > If I still understand correctly, column selectivity is used solely to
> > > calculate which index to use.
>
> > Yes.
>
> > The auto-analyze that H2 performs calculates the selectivity for CLOB
>
> > > and BLOB columns. This seems of no use.
>
> > That's true.
>
> > > It seems to me that it can
> > > also cause a large H2 database to pause for a significant amount of
> > > time, causing unnecessary memory churn.
>
> > That would be bad. The auto-analyze feature should only have very little
> > impact (I would say at most 5 percent in the normal case).
>
> > > I've tested out this theory and concluded that auto-analyze is 10 to
> > > 20 times faster when running on a simple table containing two INT
> > > column and one CLOB field.
>
> > That's weird. According to my test case, excluding the LOB columns has
> > little effect (about 15% for 2 columns, see my patch and test case below).
>
> > Please note that calculating the selectivity doesn't (shouldn't) load the
> > whole LOB column in memory, as it uses the hash value to calculate the
> > selectivity.
>
> > Is this optimisation worth implementing?
>
> > I guess yes, even if it's only about 15%.
>
> > Patch:
>
> > ### Eclipse Workspace Patch 1.0
> > #P h2
> > Index: src/main/org/h2/command/ddl/Analyze.java
> > ===================================================================
> > --- src/main/org/h2/command/ddl/Analyze.java (revision 3980)
> > +++ src/main/org/h2/command/ddl/Analyze.java (working copy)
> > @@ -16,6 +16,7 @@
> >  import org.h2.table.RegularTable;
> >  import org.h2.table.Table;
> >  import org.h2.util.StatementBuilder;
> > +import org.h2.value.Value;
>
> >  /**
> >   * This class represents the statement
> > @@ -82,7 +83,14 @@
> >          Column[] columns = table.getColumns();
> >          for (Column col : columns) {
> >              buff.appendExceptFirst(", ");
> > -            buff.append("SELECTIVITY(").append(col.getSQL()).append(')');
> > +            int type = col.getType();
> > +            if (type == Value.BLOB || type == Value.CLOB) {
> > +                // can not index LOB columns, so calculating
> > +                // the selectivity is not required
> > +                buff.append("100");
> > +            } else {
> > +
> >  buff.append("SELECTIVITY(").append(col.getSQL()).append(')');
> > +            }
> >          }
> >          buff.append(" FROM ").append(table.getSQL());
> >          if (sample > 0) {
>
> > Test case:
>
> > package db;
>
> > import java.sql.Connection;
> > import java.sql.DriverManager;
> > import java.sql.Statement;
> > import org.h2.tools.DeleteDbFiles;
> > import org.h2.util.Profiler;
>
> > public class TestSimpleDb {
>
> >     public static void main(String... args)
> >             throws Exception {
>
> >         Class.forName("org.h2.Driver");
> >         DeleteDbFiles.execute("~/temp", null, true);
>
> >         final String url = "jdbc:h2:~/temp/test";
> >         Connection conn;
> >         conn = DriverManager.getConnection(url);
> >         Statement stat = conn.createStatement();
> >         stat.execute(
> >                 "create table test(id int primary key, data clob) " +
> >                 "as select x, space(100000) " +
> >                 "from system_range(1, 100)");
> >         Profiler prof = new Profiler();
> >         prof.interval = 1;
>
> >         // 3101 old
> >         // 2688 LOB columns excluded
>
> > //        prof.startCollecting();
> >         long start = System.currentTimeMillis();
> >         for (int i = 0; i < 10000; i++) {
> >             stat.execute("analyze");
> >             long t = System.currentTimeMillis() - start;
> >             if (t > 1000) {
> >                 System.out.println("i: " + i);
> >                 break;
> >             }
> >         }
> > //        System.out.println(prof.getTop(5));
> >         conn.close();
>
> >     }
>
> > }
>
> > Regards,
> > Thomas

-- 
You received this message because you are subscribed to the Google Groups "H2 
Database" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/h2-database?hl=en.

Re: Optimisation suggestion: Don't calculate selectivity for CLOB and BLOB columns

Reply via email to