[
https://issues.apache.org/jira/browse/OOZIE-3336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16709868#comment-16709868
]
Andras Salamon commented on OOZIE-3336:
---------------------------------------
According to OOZIE-3175 users occasionally delete records inconsistently and
create orphaned actions. The database upgrade scripts should take care of this
(at least inform the user about the problem) before we try to add the new
foreign key constraints.
> [persistence] Refactor entity classes to feature PK, FK, and UQ constraints
> ---------------------------------------------------------------------------
>
> Key: OOZIE-3336
> URL: https://issues.apache.org/jira/browse/OOZIE-3336
> Project: Oozie
> Issue Type: Improvement
> Components: core
> Affects Versions: 5.0.0
> Reporter: Andras Piros
> Assignee: Julia Kinga Marton
> Priority: Major
> Fix For: 5.2.0
>
>
> When an Oozie database grows substantial in size, let's say, over a few
> hundred thousands of {{WorkflowActionBean}}, {{CoordinatorActionBean}}
> instances, we face a couple of performance issues. Here is an analysis why.
> Current Oozie JPA {{@Entity}} usage, and the resulting database DDL, suffers
> from a couple of drawback from a performance point of view:
> * {{@Id}} fields are {{String}}:
> ** leaving no space for database primary key indices to work effectively
> ** those values are calculated in case of {{WorkflowActionBean}},
> {{CoordinatorActionBean}}, and {{BundleActionBean}} instances
> * no foreign constraint is set from {{WorkflowActionBean}} to
> {{WorkflowJobBean}}, from {{CoordinatorActionBean}} to
> {{CoordinatorJobBean}}, or from {{BundleActionBean}} to {{BundleJobBean}}
> instances:
> ** have to assess JPA queries discovering parent-child relationships by hand
> ** no database indices are created, and hence, those queries that contain any
> {{JOIN}} instances are slower
> * no use of unique constraints whatsoever
> * JPA queries are created by hand instead of relying on OpenJPA
> * JPA entities are filled by hand instead of relying on OpenJPA
> Following enhancements are necessary:
> # keeping the existing {{String compositeId}} fields, let's break down the
> contents to following new fields:
> ## {{@Id long id}} - an auto-increment value that is unique across Oozie
> database
> ## {{long currentSequence}} - the sequence number of the current run since
> last Oozie server restart. The first part of the {{compositeId}}
> ## {{Timestamp serverStartupTimestamp}} - the timestamp when the Oozie server
> was last started. The second part of the {{compositeId}}
> ## {{String serverName}} - the third part of the {{compositeId}}
> ## {{String name}} - the fourth and last part of the {{compositeId}}
> ## {{compositeId}} might be calculated when an entity is loaded / persisted,
> and then stored
> # FK constraints:
> ## {{@OneToMany}} fields where we have a list of child references inside
> parent
> ## {{@ManyToOne}} fields where we have a parent reference inside child
> ## pay attention to {{FetchType}}, most of the times {{LAZY}} will be needed
> ## the containment fields should not be {{@Transient}} anymore
> # UQ constraints:
> ## on {{currentSequence}} and {{serverStartupTimestamp}}
> ## on {{currentSequence}} and {{name}}
> # new JPQL queries:
> ## to cover changed parent-child relationships
> ## to get use of each disassembled part of {{originalId}} when doing e.g.
> filtering
> # let JPA fill entities instead performing this by hand
> Following enhancements can be considered as nice-to-have:
> * upgrade to an OpenJPA version that features JPA 2.1's composite indexing
> capability
> * see whether to have an optimistic locking field using {{@Version}} instead
> of ZooKeeper based pessimistic locking would increase High Availability
> characteristics
> * refactor also SLA related entity classes
> It's necessary to have performance benchmarks with some database types like
> MySQL/MariaDB, and PostgreSQL before and after the changes for following use
> cases:
> * {{CoordinatorJobBean}} and {{WorkflowJobBean}} instances up to millions
> * {{CoordinatorActionBean}} and {{WorkflowActionBean}} instances up to tens
> of millions
> * performance for JPQLs that get a list of entities
> * performance of persisting a new entity
> * performance of querying lists of entities based on popular / possible
> filters like the ones used by {{VxJobsServlet}}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)