[ 
https://issues.apache.org/jira/browse/OOZIE-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15860479#comment-15860479
 ] 

Gopi Krishnan Nambiar commented on OOZIE-1525:
----------------------------------------------

I ran into this error recently and triaged to find that the cause for me was 
running multiple Oozie servers without HA support. I was running Oozie version 
4.0.1 which did not support HA. Do you think this could have been the reason 
for your issues too ?

> Oozie workflow does not update status sometimes and is stuck in RUNNING state.
> ------------------------------------------------------------------------------
>
>                 Key: OOZIE-1525
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1525
>             Project: Oozie
>          Issue Type: Bug
>    Affects Versions: 3.3.2
>            Reporter: Eugene Shevchuk
>         Attachments: OOZIE-90.patch
>
>
> Sometimes workflow is stuck in RUNNING state with the following info:
> {noformat}
> Job ID : 0000002-130822205833322-oozie-hdp-W
> ------------------------------------------------------------------------------------------------------------------------------------
> Workflow Name : wordcount-wf
> App Path      : ***
> Status        : RUNNING
> Run           : 0
> User          : ***
> Group         : -
> Created       : 2013-08-22 21:01 GMT
> Started       : 2013-08-22 21:01 GMT
> Last Modified : 2013-08-22 21:02 GMT
> Ended         : -
> CoordAction ID: -
> Actions
> ------------------------------------------------------------------------------------------------------------------------------------
> ID                                                                            
> Status    Ext ID                 Ext Status Err Code
> ------------------------------------------------------------------------------------------------------------------------------------
> 0000002-130822205833322-oozie-hdp-W@:start:                                   
> OK        -                      OK         -
> ------------------------------------------------------------------------------------------------------------------------------------
> 0000002-130822205833322-oozie-hdp-W@wc                                        
> OK        job_201308221727_0287  SUCCEEDED  -
> ------------------------------------------------------------------------------------------------------------------------------------
> {noformat}
> In log there are some problems with jpaservice:
> {noformat}
> 2013-08-22 23:00:59,170 ERROR SignalXCommand:536 - USER[***] GROUP[-] TOKEN[] 
> APP[wordcount-wf] JOB[0000002-130822205833322-oozie-hdp-W] 
> ACTION[0000002-130822205833322-oozie-hdp-W@wc] Exception, 
> <openjpa-2.1.0-r422266:1071316 fatal store error> 
> org.apache.openjpa.persistence.RollbackException: The transaction has been 
> rolled back.  See the nested exceptions for details on the errors that 
> occurred.
> FailedObject: org.apache.oozie.WorkflowActionBean@1dadb1b6
>       at 
> org.apache.openjpa.persistence.EntityManagerImpl.commit(EntityManagerImpl.java:585)
>       at org.apache.oozie.service.JPAService.execute(JPAService.java:217)
>       at 
> org.apache.oozie.command.wf.SignalXCommand.execute(SignalXCommand.java:310)
>       at 
> org.apache.oozie.command.wf.SignalXCommand.execute(SignalXCommand.java:64)
>       at org.apache.oozie.command.XCommand.call(XCommand.java:277)
>       at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:326)
>       at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:255)
>       at 
> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:175)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>       at java.lang.Thread.run(Thread.java:722)
> Caused by: <openjpa-2.1.0-r422266:1071316 fatal general error> 
> org.apache.openjpa.persistence.PersistenceException: The transaction has been 
> rolled back.  See the nested exceptions for details on the errors that 
> occurred.
> FailedObject: org.apache.oozie.WorkflowActionBean@1dadb1b6
>       at 
> org.apache.openjpa.kernel.BrokerImpl.newFlushException(BrokerImpl.java:2316)
>       at org.apache.openjpa.kernel.BrokerImpl.flush(BrokerImpl.java:2153)
>       at org.apache.openjpa.kernel.BrokerImpl.flushSafe(BrokerImpl.java:2051)
>       at 
> org.apache.openjpa.kernel.BrokerImpl.beforeCompletion(BrokerImpl.java:1969)
>       at 
> org.apache.openjpa.kernel.LocalManagedRuntime.commit(LocalManagedRuntime.java:81)
>       at org.apache.openjpa.kernel.BrokerImpl.commit(BrokerImpl.java:1493)
>       at 
> org.apache.openjpa.kernel.DelegatingBroker.commit(DelegatingBroker.java:925)
>       at 
> org.apache.openjpa.persistence.EntityManagerImpl.commit(EntityManagerImpl.java:561)
>       ... 10 more
> Caused by: <openjpa-2.1.0-r422266:1071316 fatal store error> 
> org.apache.openjpa.persistence.EntityExistsException: Violation of PRIMARY 
> KEY constraint 'PK__WF_ACTIO__3213E83FECE1688E'. Cannot insert duplicate key 
> in object 'dbo.WF_ACTIONS'. The duplicate key value is 
> (0000002-130822205833322-oozie-hdp-W@wc). {prepstmnt 1032423179 INSERT INTO 
> WF_ACTIONS (id, conf, console_url, cred, data, error_code, error_message, 
> external_child_ids, external_id, external_status, name, retries, stats, 
> tracker_uri, transition, type, user_retry_count, user_retry_interval, 
> user_retry_max, bean_type, end_time, execution_path, last_check_time, 
> log_token, pending, pending_age, signal_value, sla_xml, start_time, status, 
> wf_id) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, 
> ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) [params=?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, 
> ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?]} [code=2627, 
> state=23000]
> FailedObject: org.apache.oozie.WorkflowActionBean@1dadb1b6
>       at 
> org.apache.openjpa.jdbc.sql.DBDictionary.narrow(DBDictionary.java:4854)
>       at 
> org.apache.openjpa.jdbc.sql.DBDictionary.newStoreException(DBDictionary.java:4829)
>       at 
> org.apache.openjpa.jdbc.sql.SQLExceptions.getStore(SQLExceptions.java:136)
>       at 
> org.apache.openjpa.jdbc.sql.SQLExceptions.getStore(SQLExceptions.java:78)
>       at 
> org.apache.openjpa.jdbc.kernel.PreparedStatementManagerImpl.flushAndUpdate(PreparedStatementManagerImpl.java:143)
>       at 
> org.apache.openjpa.jdbc.kernel.BatchingPreparedStatementManagerImpl.batchOrExecuteRow(BatchingPreparedStatementManagerImpl.java:101)
>       at 
> org.apache.openjpa.jdbc.kernel.BatchingPreparedStatementManagerImpl.flushAndUpdate(BatchingPreparedStatementManagerImpl.java:85)
>       at 
> org.apache.openjpa.jdbc.kernel.PreparedStatementManagerImpl.flushInternal(PreparedStatementManagerImpl.java:99)
>       at 
> org.apache.openjpa.jdbc.kernel.PreparedStatementManagerImpl.flush(PreparedStatementManagerImpl.java:87)
>       at 
> org.apache.openjpa.jdbc.kernel.ConstraintUpdateManager.flush(ConstraintUpdateManager.java:550)
>       at 
> org.apache.openjpa.jdbc.kernel.ConstraintUpdateManager.flush(ConstraintUpdateManager.java:107)
>       at 
> org.apache.openjpa.jdbc.kernel.BatchingConstraintUpdateManager.flush(BatchingConstraintUpdateManager.java:59)
>       at 
> org.apache.openjpa.jdbc.kernel.AbstractUpdateManager.flush(AbstractUpdateManager.java:103)
>       at 
> org.apache.openjpa.jdbc.kernel.AbstractUpdateManager.flush(AbstractUpdateManager.java:76)
>       at 
> org.apache.openjpa.jdbc.kernel.JDBCStoreManager.flush(JDBCStoreManager.java:742)
>       at 
> org.apache.openjpa.kernel.DelegatingStoreManager.flush(DelegatingStoreManager.java:131)
>       ... 17 more
> Caused by: org.apache.openjpa.lib.jdbc.ReportingSQLException: Violation of 
> PRIMARY KEY constraint 'PK__WF_ACTIO__3213E83FECE1688E'. Cannot insert 
> duplicate key in object 'dbo.WF_ACTIONS'. The duplicate key value is 
> (0000002-130822205833322-oozie-hdp-W@wc). {prepstmnt 1032423179 INSERT INTO 
> WF_ACTIONS (id, conf, console_url, cred, data, error_code, error_message, 
> external_child_ids, external_id, external_status, name, retries, stats, 
> tracker_uri, transition, type, user_retry_count, user_retry_interval, 
> user_retry_max, bean_type, end_time, execution_path, last_check_time, 
> log_token, pending, pending_age, signal_value, sla_xml, start_time, status, 
> wf_id) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, 
> ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) [params=?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, 
> ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?]} [code=2627, 
> state=23000]
>       at 
> org.apache.openjpa.lib.jdbc.LoggingConnectionDecorator.wrap(LoggingConnectionDecorator.java:281)
>       at 
> org.apache.openjpa.lib.jdbc.LoggingConnectionDecorator.wrap(LoggingConnectionDecorator.java:257)
>       at 
> org.apache.openjpa.lib.jdbc.LoggingConnectionDecorator.access$1000(LoggingConnectionDecorator.java:72)
>       at 
> org.apache.openjpa.lib.jdbc.LoggingConnectionDecorator$LoggingConnection$LoggingPreparedStatement.executeUpdate(LoggingConnectionDecorator.java:1199)
>       at 
> org.apache.openjpa.lib.jdbc.DelegatingPreparedStatement.executeUpdate(DelegatingPreparedStatement.java:291)
>       at 
> org.apache.openjpa.jdbc.kernel.JDBCStoreManager$CancelPreparedStatement.executeUpdate(JDBCStoreManager.java:1776)
>       at 
> org.apache.openjpa.jdbc.kernel.PreparedStatementManagerImpl.executeUpdate(PreparedStatementManagerImpl.java:267)
>       at 
> org.apache.openjpa.jdbc.kernel.PreparedStatementManagerImpl.flushAndUpdate(PreparedStatementManagerImpl.java:118)
>       ... 28 more
> 2013-08-22 23:00:59,170  WARN CallableQueueService$CompositeCallable:542 - 
> USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] exception callable 
> [signal], E0607: Other error in operation [signal], The transaction has been 
> rolled back.  See the nested exceptions for details on the errors that 
> occurred.
> org.apache.oozie.command.CommandException: E0607: Other error in operation 
> [signal], The transaction has been rolled back.  See the nested exceptions 
> for details on the errors that occurred.
>       at org.apache.oozie.command.XCommand.call(XCommand.java:317)
>       at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:326)
>       at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:255)
>       at 
> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:175)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>       at java.lang.Thread.run(Thread.java:722)
> Caused by: <openjpa-2.1.0-r422266:1071316 fatal store error> 
> org.apache.openjpa.persistence.RollbackException: The transaction has been 
> rolled back.  See the nested exceptions for details on the errors that 
> occurred.
> FailedObject: org.apache.oozie.WorkflowActionBean@1dadb1b6
>       at 
> org.apache.openjpa.persistence.EntityManagerImpl.commit(EntityManagerImpl.java:585)
>       at org.apache.oozie.service.JPAService.execute(JPAService.java:217)
>       at 
> org.apache.oozie.command.wf.SignalXCommand.execute(SignalXCommand.java:310)
>       at 
> org.apache.oozie.command.wf.SignalXCommand.execute(SignalXCommand.java:64)
>       at org.apache.oozie.command.XCommand.call(XCommand.java:277)
>       ... 6 more
> Caused by: <openjpa-2.1.0-r422266:1071316 fatal general error> 
> org.apache.openjpa.persistence.PersistenceException: The transaction has been 
> rolled back.  See the nested exceptions for details on the errors that 
> occurred.
> FailedObject: org.apache.oozie.WorkflowActionBean@1dadb1b6
>       at 
> org.apache.openjpa.kernel.BrokerImpl.newFlushException(BrokerImpl.java:2316)
>       at org.apache.openjpa.kernel.BrokerImpl.flush(BrokerImpl.java:2153)
>       at org.apache.openjpa.kernel.BrokerImpl.flushSafe(BrokerImpl.java:2051)
>       at 
> org.apache.openjpa.kernel.BrokerImpl.beforeCompletion(BrokerImpl.java:1969)
>       at 
> org.apache.openjpa.kernel.LocalManagedRuntime.commit(LocalManagedRuntime.java:81)
>       at org.apache.openjpa.kernel.BrokerImpl.commit(BrokerImpl.java:1493)
>       at 
> org.apache.openjpa.kernel.DelegatingBroker.commit(DelegatingBroker.java:925)
>       at 
> org.apache.openjpa.persistence.EntityManagerImpl.commit(EntityManagerImpl.java:561)
>       ... 10 more
> Caused by: <openjpa-2.1.0-r422266:1071316 fatal store error> 
> org.apache.openjpa.persistence.EntityExistsException: Violation of PRIMARY 
> KEY constraint 'PK__WF_ACTIO__3213E83FECE1688E'. Cannot insert duplicate key 
> in object 'dbo.WF_ACTIONS'. The duplicate key value is 
> (0000002-130822205833322-oozie-hdp-W@wc). {prepstmnt 1032423179 INSERT INTO 
> WF_ACTIONS (id, conf, console_url, cred, data, error_code, error_message, 
> external_child_ids, external_id, external_status, name, retries, stats, 
> tracker_uri, transition, type, user_retry_count, user_retry_interval, 
> user_retry_max, bean_type, end_time, execution_path, last_check_time, 
> log_token, pending, pending_age, signal_value, sla_xml, start_time, status, 
> wf_id) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, 
> ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) [params=?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, 
> ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?]} [code=2627, 
> state=23000]
> FailedObject: org.apache.oozie.WorkflowActionBean@1dadb1b6
>       at 
> org.apache.openjpa.jdbc.sql.DBDictionary.narrow(DBDictionary.java:4854)
>       at 
> org.apache.openjpa.jdbc.sql.DBDictionary.newStoreException(DBDictionary.java:4829)
>       at 
> org.apache.openjpa.jdbc.sql.SQLExceptions.getStore(SQLExceptions.java:136)
>       at 
> org.apache.openjpa.jdbc.sql.SQLExceptions.getStore(SQLExceptions.java:78)
>       at 
> org.apache.openjpa.jdbc.kernel.PreparedStatementManagerImpl.flushAndUpdate(PreparedStatementManagerImpl.java:143)
>       at 
> org.apache.openjpa.jdbc.kernel.BatchingPreparedStatementManagerImpl.batchOrExecuteRow(BatchingPreparedStatementManagerImpl.java:101)
>       at 
> org.apache.openjpa.jdbc.kernel.BatchingPreparedStatementManagerImpl.flushAndUpdate(BatchingPreparedStatementManagerImpl.java:85)
>       at 
> org.apache.openjpa.jdbc.kernel.PreparedStatementManagerImpl.flushInternal(PreparedStatementManagerImpl.java:99)
>       at 
> org.apache.openjpa.jdbc.kernel.PreparedStatementManagerImpl.flush(PreparedStatementManagerImpl.java:87)
>       at 
> org.apache.openjpa.jdbc.kernel.ConstraintUpdateManager.flush(ConstraintUpdateManager.java:550)
>       at 
> org.apache.openjpa.jdbc.kernel.ConstraintUpdateManager.flush(ConstraintUpdateManager.java:107)
>       at 
> org.apache.openjpa.jdbc.kernel.BatchingConstraintUpdateManager.flush(BatchingConstraintUpdateManager.java:59)
>       at 
> org.apache.openjpa.jdbc.kernel.AbstractUpdateManager.flush(AbstractUpdateManager.java:103)
>       at 
> org.apache.openjpa.jdbc.kernel.AbstractUpdateManager.flush(AbstractUpdateManager.java:76)
>       at 
> org.apache.openjpa.jdbc.kernel.JDBCStoreManager.flush(JDBCStoreManager.java:742)
>       at 
> org.apache.openjpa.kernel.DelegatingStoreManager.flush(DelegatingStoreManager.java:131)
>       ... 17 more
> Caused by: org.apache.openjpa.lib.jdbc.ReportingSQLException: Violation of 
> PRIMARY KEY constraint 'PK__WF_ACTIO__3213E83FECE1688E'. Cannot insert 
> duplicate key in object 'dbo.WF_ACTIONS'. The duplicate key value is 
> (0000002-130822205833322-oozie-hdp-W@wc). {prepstmnt 1032423179 INSERT INTO 
> WF_ACTIONS (id, conf, console_url, cred, data, error_code, error_message, 
> external_child_ids, external_id, external_status, name, retries, stats, 
> tracker_uri, transition, type, user_retry_count, user_retry_interval, 
> user_retry_max, bean_type, end_time, execution_path, last_check_time, 
> log_token, pending, pending_age, signal_value, sla_xml, start_time, status, 
> wf_id) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, 
> ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) [params=?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, 
> ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?]} [code=2627, 
> state=23000]
>       at 
> org.apache.openjpa.lib.jdbc.LoggingConnectionDecorator.wrap(LoggingConnectionDecorator.java:281)
>       at 
> org.apache.openjpa.lib.jdbc.LoggingConnectionDecorator.wrap(LoggingConnectionDecorator.java:257)
>       at 
> org.apache.openjpa.lib.jdbc.LoggingConnectionDecorator.access$1000(LoggingConnectionDecorator.java:72)
>       at 
> org.apache.openjpa.lib.jdbc.LoggingConnectionDecorator$LoggingConnection$LoggingPreparedStatement.executeUpdate(LoggingConnectionDecorator.java:1199)
>       at 
> org.apache.openjpa.lib.jdbc.DelegatingPreparedStatement.executeUpdate(DelegatingPreparedStatement.java:291)
>       at 
> org.apache.openjpa.jdbc.kernel.JDBCStoreManager$CancelPreparedStatement.executeUpdate(JDBCStoreManager.java:1776)
>       at 
> org.apache.openjpa.jdbc.kernel.PreparedStatementManagerImpl.executeUpdate(PreparedStatementManagerImpl.java:267)
>       at 
> org.apache.openjpa.jdbc.kernel.PreparedStatementManagerImpl.flushAndUpdate(PreparedStatementManagerImpl.java:118)
>       ... 28 more
> {noformat}
> Possible problem may be that when oozie is inserting a new object, the JPA 
> entity manager needs to hear back from the connection to know the commit 
> succeeded.
> Sometimes when commit indeed succeeded but connection was lost, the JPA 
> entityManager will keep retrying to persist the same object since it assumes 
> the commit failed, therefore you see the "primary key constraint violation 
> exception" in the stacktrace. The workflow status remains in running state 
> until all of its child actions turn "SUCCEEDED". Since one of the child 
> action fails to be created, the workflow will always remain in running state.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to