[ 
https://issues.apache.org/jira/browse/OOZIE-2662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15510138#comment-15510138
 ] 

Andras Piros commented on OOZIE-2662:
-------------------------------------

The [*current 
patch*|https://issues.apache.org/jira/secure/attachment/12829584/OOZIE-2662.002.wip.patch]
 addresses duplicate entries by skipping only the rows that violate the primary 
key constraint enforced by {{@Id}}. Please see 
{{TestDBLoadDump.testSecondImportDoesNotImportDuplicates()}} for details.

The only problem with that is when there are rows with any other type of 
constraint violations we get the same behavior: only violating rows are skipped 
while performing the import process. Please see 
{{TestDBLoadDump.testImportSkipsRowsContainingInvalidData()}} for details.

Using OpenJPA we cannot distinguish between different types of constraint 
violations (due to {{@Id}} or {{@Length}}, for example) - OpenJPA wraps both 
inside the very same {{RollbackException}} using the very same mechanisms.

So the question is, *[~rkanter]* and *[~jaydeepvishwakarma]*, should we go on 
like that, or should we not skip but halt the whole import process on both 
duplicate and otherwise violating rows?

> DB migration fails if DB is too big
> -----------------------------------
>
>                 Key: OOZIE-2662
>                 URL: https://issues.apache.org/jira/browse/OOZIE-2662
>             Project: Oozie
>          Issue Type: Bug
>            Reporter: Peter Cseh
>            Assignee: Andras Piros
>         Attachments: OOZIE-2662.001.patch, OOZIE-2662.002.wip.patch
>
>
> The initial version of the DB import tool commits all the workflows, actions 
> etc. in one huge commit. If it does not fits into the memory, AOOME is thrown.
> We should commit every 1k or 10k elements to prevent this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to