[jira] Updated: (ODE-694) De-Normalizing Large Data

Karthick Sankarachary (JIRA) Mon, 09 Nov 2009 13:23:00 -0800

     [ 
https://issues.apache.org/jira/browse/ODE-694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Karthick Sankarachary updated ODE-694:
--------------------------------------

    Attachment: migration-step-c-delete-large-data.sql
                migration-step-b-copy-large-data.sql
                migration-step-a-upgrade-schema.sql

The upgrade path, including the scripts attached, need to be release noted.

> De-Normalizing Large Data
> -------------------------
>
>                 Key: ODE-694
>                 URL: https://issues.apache.org/jira/browse/ODE-694
>             Project: ODE
>          Issue Type: Improvement
>          Components: Axis2 Integration, BPEL Runtime
>    Affects Versions: 1.3.3
>            Reporter: Karthick Sankarachary
>            Assignee: Karthick Sankarachary
>             Fix For: 1.3.4
>
>         Attachments: denormalizing-large-data.patch, 
> migration-step-a-upgrade-schema.sql, migration-step-b-copy-large-data.sql, 
> migration-step-c-delete-large-data.sql
>
>
> Currently, in the hibernate implementation of the process data access object 
> (DAO) interface, all of the large (read blob) values is stored not in the 
> table where it belongs, but rather in a detached table called LARGE_DATA. 
> Examples of such dependent tables include those that hold the state of BPEL 
> instances, BPEL events, SOAP messages, WSDL partner links, and XML variables, 
> among other things. Inevitably, the LARGE_DATA table ends up becoming the 
> bottleneck, because it forces us to not only execute a large number of joins 
> but also hold that many more locks. As a result, the (hibernate) DAO layer 
> takes longer to read/write/delete process data, and may potentially deadlock 
> on the LARGE_DATA table. 
> The obvious way out of this mess is to move the blob column from the 
> LARGE_DATA table to the table where it is currently referenced through a 
> foreign key. However, care must be taken to migrate the schema and data of 
> existing servers at the time of upgrade. The upgrade path is described below, 
> where the dependent table refers to the table that currently has a foreign 
> key reference into the parent (i.e., LARGE_DATA) table: 
> a) For each such foreign key in the dependent table, add the corresponding 
> blob column(s) in the dependent table.
> b) For each such foreign key in the dependent table, copy the blob value from 
> the corresponding row of the parent into the corresponding column of the 
> dependent that was added in step (a).
> c) Drop the foreign keys in the dependent table that refer to the LARGE_DATA 
> table, and the LARGE_DATA table itself. Finally, increment the version of the 
> ODE schema (to indicate that the schema has been changed).
> Needless to say, we must be prepared for scenarios wherein the server was 
> upgraded but the schema wasn't (for whatever reason). We do so by checking 
> the ODE schema version at the time of server startup, and failing gracefully 
> if it doesn't match the expected value. Note that we consciously chose not to 
> automate the upgrade path as part of the migration handler, primarily due to 
> the long-running nature of the transaction.
> As a result of this change, we observed a significant improvement in the 
> performance of the hibernate-based process server (between 30-40%). However, 
> individual results may vary. 
> Note that the downside to moving the blob column into the dependent table is 
> that we may inadvertently end up reading the blob property as a side-effect 
> of an unrelated query on that table. As you may have guessed, that was the 
> motivation for introducing the LARGE_DATA table in the first place. 
> Fortunately, there are ways to mitigate against such cases, which include (a) 
> using lazy fetching of the blob properties in the problematic dependent 
> table, or (b) re-introducing a large data table specifically for the 
> problematic dependent table, and using join fetching to work around the N+1 
> select problem. We plan on implementing such optimizations on a case-by-case 
> basis, if and when required.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ODE-694) De-Normalizing Large Data

Reply via email to