[jira] Updated: (ODE-694) De-Normalizing Large Data

Karthick Sankarachary (JIRA) Mon, 09 Nov 2009 13:20:59 -0800

     [ 
https://issues.apache.org/jira/browse/ODE-694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Karthick Sankarachary updated ODE-694:
--------------------------------------

    Description: 
Currently, in the hibernate implementation of the process data access object 
(DAO) interface, all of the large (read blob) values is stored not in the table 
where it belongs, but rather in a detached table called LARGE_DATA. Examples of 
such dependent tables include those that hold the state of BPEL instances, BPEL 
events, SOAP messages, WSDL partner links, and XML variables, among other 
things. Inevitably, the LARGE_DATA table ends up becoming the bottleneck, 
because it forces us to not only execute a large number of joins but also hold 
that many more locks. As a result, the (hibernate) DAO layer takes longer to 
read/write/delete process data, and may potentially deadlock on the LARGE_DATA 
table. 

The obvious way out of this mess is to move the blob column from the LARGE_DATA 
table to the table where it is currently referenced through a foreign key. 
However, care must be taken to migrate the schema and data of existing servers 
at the time of upgrade. The upgrade path is described below, where the 
dependent table refers to the table that currently has a foreign key reference 
into the parent (i.e., LARGE_DATA) table: 

a) For each such foreign key in the dependent table, add the corresponding blob 
column(s) in the dependent table.
b) For each such foreign key in the dependent table, copy the blob value from 
the corresponding row of the parent into the corresponding column of the 
dependent that was added in step (a).
c) Drop the foreign keys in the dependent table that refer to the LARGE_DATA 
table, and the LARGE_DATA table itself. Finally, increment the version of the 
ODE schema (to indicate that the schema has been changed).

Needless to say, we must be prepared for scenarios wherein the server was 
upgraded but the schema wasn't (for whatever reason). We do so by checking the 
ODE schema version at the time of server startup, and failing gracefully if it 
doesn't match the expected value. Note that we consciously chose not to 
automate the upgrade path as part of the migration handler, primarily due to 
the long-running nature of the transaction.

As a result of this change, we observed a significant improvement in the 
performance of the hibernate-based process server (between 30-40%). However, 
individual results may vary. 

Note that the downside to moving the blob column into the dependent table is 
that we may inadvertently end up reading the blob property as a side-effect of 
an unrelated query on that table. As you may have guessed, that was the 
motivation for introducing the LARGE_DATA table in the first place. 
Fortunately, there are ways to mitigate against such cases, which include (a) 
using lazy fetching of the blob properties in the problematic dependent table, 
or (b) re-introducing a large data table specifically for the problematic 
dependent table, and using join fetching to work around the N+1 select problem. 
We plan on implementing such optimizations on a case-by-case basis, if and when 
required.

  was:
Currently, in the hibernate implementation of the process data access object 
(DAO) interface, all of the large (read blob) values is stored not in the table 
where it belongs, but rather in a detached table called LARGE_DATA. Examples of 
such dependent tables include those that hold the state of BPEL instances, BPEL 
events, SOAP messages, WSDL partner links, and XML variables, among other 
things. Inevitably, the LARGE_DATA table ends up becoming the bottleneck, 
because it forces us to not only execute a large number of joins but also hold 
that many more locks. As a result, the (hibernate) DAO layer take longer to 
read/write/delete process data, and may even potentially deadlock on the 
LARGE_DATA table. 

The obvious way out of this mess is to move the blob column from the LARGE_DATA 
table to the table where it is currently referenced through a foreign key. 
However, care must be taken to migrate the schema and data of existing servers 
at the time of upgrade. The upgrade path is described below, where the 
dependent table refers to the table that currently has a foreign key reference 
into the parent (i.e., LARGE_DATA) table: 

a) For each such foreign key in the dependent table, add the corresponding blob 
column(s) in the dependent table.
b) For each such foreign key in the dependent table, copy the blob value from 
the corresponding row of the parent into the corresponding column of the 
dependent that was added in step (a).
c) Drop the foreign keys in the dependent table that refer to the LARGE_DATA 
table.
d) Drop the LARGE_DATA table.
e) Increment the version of the ODE schema (to indicate that the schema has 
been changed).

Needless to say, we must be prepared for scenarios wherein the server was 
upgraded but the schema wasn't (for whatever reason). We do so by checking the 
ODE schema version at the time of server startup, and failing gracefully if it 
doesn't match the expected value. Note that we consciously chose not to 
automate the upgrade path as part of the migration handler, primarily due to 
the long-running nature of the transaction.

As a result of this change, we observed a significant improvement in the 
performance of the hibernate-based process server (between 30-40%). However, 
individual results may vary. 

Note that the downside to moving the blob column into the dependent table is 
that we may inadvertently end up reading the blob property as a side-effect of 
an unrelated query on that table. As you may have guessed, that was the 
motivation for introducing the LARGE_DATA table in the first place. 
Fortunately, there are ways to mitigate against such cases, which include (a) 
using lazy fetching of the blob properties in the problematic dependent table, 
or (b) re-introducing a large data table specifically for the problematic 
dependent table, and using join fetching to work around the N+1 select problem. 
We plan on implementing such optimizations on a case-by-case basis, if and when 
required.


> De-Normalizing Large Data
> -------------------------
>
>                 Key: ODE-694
>                 URL: https://issues.apache.org/jira/browse/ODE-694
>             Project: ODE
>          Issue Type: Improvement
>          Components: Axis2 Integration, BPEL Runtime
>    Affects Versions: 1.3.3
>            Reporter: Karthick Sankarachary
>            Assignee: Karthick Sankarachary
>             Fix For: 1.3.4
>
>         Attachments: denormalizing-large-data.patch
>
>
> Currently, in the hibernate implementation of the process data access object 
> (DAO) interface, all of the large (read blob) values is stored not in the 
> table where it belongs, but rather in a detached table called LARGE_DATA. 
> Examples of such dependent tables include those that hold the state of BPEL 
> instances, BPEL events, SOAP messages, WSDL partner links, and XML variables, 
> among other things. Inevitably, the LARGE_DATA table ends up becoming the 
> bottleneck, because it forces us to not only execute a large number of joins 
> but also hold that many more locks. As a result, the (hibernate) DAO layer 
> takes longer to read/write/delete process data, and may potentially deadlock 
> on the LARGE_DATA table. 
> The obvious way out of this mess is to move the blob column from the 
> LARGE_DATA table to the table where it is currently referenced through a 
> foreign key. However, care must be taken to migrate the schema and data of 
> existing servers at the time of upgrade. The upgrade path is described below, 
> where the dependent table refers to the table that currently has a foreign 
> key reference into the parent (i.e., LARGE_DATA) table: 
> a) For each such foreign key in the dependent table, add the corresponding 
> blob column(s) in the dependent table.
> b) For each such foreign key in the dependent table, copy the blob value from 
> the corresponding row of the parent into the corresponding column of the 
> dependent that was added in step (a).
> c) Drop the foreign keys in the dependent table that refer to the LARGE_DATA 
> table, and the LARGE_DATA table itself. Finally, increment the version of the 
> ODE schema (to indicate that the schema has been changed).
> Needless to say, we must be prepared for scenarios wherein the server was 
> upgraded but the schema wasn't (for whatever reason). We do so by checking 
> the ODE schema version at the time of server startup, and failing gracefully 
> if it doesn't match the expected value. Note that we consciously chose not to 
> automate the upgrade path as part of the migration handler, primarily due to 
> the long-running nature of the transaction.
> As a result of this change, we observed a significant improvement in the 
> performance of the hibernate-based process server (between 30-40%). However, 
> individual results may vary. 
> Note that the downside to moving the blob column into the dependent table is 
> that we may inadvertently end up reading the blob property as a side-effect 
> of an unrelated query on that table. As you may have guessed, that was the 
> motivation for introducing the LARGE_DATA table in the first place. 
> Fortunately, there are ways to mitigate against such cases, which include (a) 
> using lazy fetching of the blob properties in the problematic dependent 
> table, or (b) re-introducing a large data table specifically for the 
> problematic dependent table, and using join fetching to work around the N+1 
> select problem. We plan on implementing such optimizations on a case-by-case 
> basis, if and when required.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ODE-694) De-Normalizing Large Data

Reply via email to