Re: [h2] Update query performance - looking for ways to speed up

Pani Fri, 17 Apr 2015 08:51:29 -0700

Thanks Thomas. Let me give some more background info.

We cannot set A_ID to correct value (as part of insert) because this is an 
ID column for the actual value column "A".
For example , column A could be "Customer Name" column. 
The A_ID column for this column would be "Customer_ID".


We get customer names from a input file.We load those customers into our 
staging table.
When we actually load customers into our system, we need to load them with 
Customer ID.
We update the Customer_ID column of the staging table using the lookup.

This applies to all other columns as well - We need to have ID column 
populated for each non-ID column in the staging table.
We have a Name,ID look up table to help us with getting ID for each non-ID 
customer attribute (or a small HashMap if Customers are few).

Like the above, we have almost 50 columns for which we need to update their 
IDs (various characteristics of the customer - like color/age/gender etc - 
all these have name-id lookup).
For a 100,000 customer records table, each column update is taking 2-3 
minutes as I was mentioning in the original post.

We noticed that if we reduce the width of the table (having less 
characteristics/columns and therefore less ID columns), the update 
statements are taking little less time.
This is true even when the number of rows in the table are same (100,000 in 
our case).

We would like to bring down the update time of each ID column to < 10 
seconds (from the current 2-3 minutes).
This way, the overall time to update 50 columns would be < 500 seconds.

Hope I provided more useful information this time. 
Please suggest us some ideas. 

Appreciate your help.

Thanks,
Pani



On Thursday, April 16, 2015 at 10:42:54 PM UTC-7, Thomas Mueller wrote:
>
> Hi,
>
> The problem seems to be that you use a slow algorithm, kind of like bubble 
> sort instead of quicksort. Your algorithm seems to take quadratic time 
> (twice the number of rows results in four times slower performance).
>
> Why don't you use just one update?
>
>     update table XYZ set a_id = userDefinedFunction(a)
>
> Or even better, set the a_id to the correct value when inserting.
>
> You might need to better describe what you do. What is a_id and what is a 
> (examples).
>
> Regards,
> Thomas
>
> On Friday, April 17, 2015, Pani <[email protected] <javascript:>> wrote:
>
>> We have a table with 50 columns – each of these columns would have 
>> another twin column.The table therefore would have 100 columns (table is 
>> 100 column wide).
>>
>> For example if we have a column “A”, we have its twin column as “A_ID”.
>> Column “A” would have a string value. Column “A_ID” would have a 
>> numeric_ID for that string value.
>> Like this, we have 50 regular columns and 50 ID columns for them.
>>
>> We have about 100,000 rows in this table. All the column “A”s  have their 
>> own individual index.
>> We know which value of column “A” must get what ID value. Our SQLs are 
>> like follows.
>>
>> Update table XYZ set A_ID = ? where A = ?
>>
>> If unique values in Column “A” are N, then the above query would run N 
>> times for column A (updating ID value for each unique value of column “A”).
>>
>> Our problem – Each update statement is taking 2-3 minutes to complete. 
>> With this, it takes about 100-150 minutes for 50 columns (to update their 
>> 50 twin _ID columns).
>> We noticed that if we reduce the width of the table (having less columns 
>> and therefore less ID columns), the update statements are taking little 
>> less time.
>>
>> We did add “ANALYZE” and we saw that index is being used in EXPLAIN.
>> We tried setting up this table in MEMORY (and creating HASH indexes for 
>> it) but that is only giving us 10 second drop (10 seconds less than 2 
>> minutes or 3 minutes). Other info = we are running H2 with these parameters 
>> - 
>> MAX_OPERATION_MEMORY=1610612736;CACHE_SIZE=268435456;TRACE_LEVEL_FILE=0;LOG=0;UNDO_LOG=0
>>
>> We would like to bring down the update time to < 10 seconds for each 
>> column.
>> This way, the overall time to update 50 columns would be < 500 seconds.
>> Wondering how we can achieve that. Please suggest us some ideas.
>>
>> Appreciate your thoughts.
>>
>> Thanks,
>> Pani
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "H2 Database" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at http://groups.google.com/group/h2-database.
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups "H2 
Database" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/h2-database.
For more options, visit https://groups.google.com/d/optout.

Re: [h2] Update query performance - looking for ways to speed up

Reply via email to