[
https://issues.apache.org/jira/browse/OOZIE-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15860479#comment-15860479
]
Gopi Krishnan Nambiar commented on OOZIE-1525:
----------------------------------------------
I ran into this error recently and triaged to find that the cause for me was
running multiple Oozie servers without HA support. I was running Oozie version
4.0.1 which did not support HA. Do you think this could have been the reason
for your issues too ?
> Oozie workflow does not update status sometimes and is stuck in RUNNING state.
> ------------------------------------------------------------------------------
>
> Key: OOZIE-1525
> URL: https://issues.apache.org/jira/browse/OOZIE-1525
> Project: Oozie
> Issue Type: Bug
> Affects Versions: 3.3.2
> Reporter: Eugene Shevchuk
> Attachments: OOZIE-90.patch
>
>
> Sometimes workflow is stuck in RUNNING state with the following info:
> {noformat}
> Job ID : 0000002-130822205833322-oozie-hdp-W
> ------------------------------------------------------------------------------------------------------------------------------------
> Workflow Name : wordcount-wf
> App Path : ***
> Status : RUNNING
> Run : 0
> User : ***
> Group : -
> Created : 2013-08-22 21:01 GMT
> Started : 2013-08-22 21:01 GMT
> Last Modified : 2013-08-22 21:02 GMT
> Ended : -
> CoordAction ID: -
> Actions
> ------------------------------------------------------------------------------------------------------------------------------------
> ID
> Status Ext ID Ext Status Err Code
> ------------------------------------------------------------------------------------------------------------------------------------
> 0000002-130822205833322-oozie-hdp-W@:start:
> OK - OK -
> ------------------------------------------------------------------------------------------------------------------------------------
> 0000002-130822205833322-oozie-hdp-W@wc
> OK job_201308221727_0287 SUCCEEDED -
> ------------------------------------------------------------------------------------------------------------------------------------
> {noformat}
> In log there are some problems with jpaservice:
> {noformat}
> 2013-08-22 23:00:59,170 ERROR SignalXCommand:536 - USER[***] GROUP[-] TOKEN[]
> APP[wordcount-wf] JOB[0000002-130822205833322-oozie-hdp-W]
> ACTION[0000002-130822205833322-oozie-hdp-W@wc] Exception,
> <openjpa-2.1.0-r422266:1071316 fatal store error>
> org.apache.openjpa.persistence.RollbackException: The transaction has been
> rolled back. See the nested exceptions for details on the errors that
> occurred.
> FailedObject: org.apache.oozie.WorkflowActionBean@1dadb1b6
> at
> org.apache.openjpa.persistence.EntityManagerImpl.commit(EntityManagerImpl.java:585)
> at org.apache.oozie.service.JPAService.execute(JPAService.java:217)
> at
> org.apache.oozie.command.wf.SignalXCommand.execute(SignalXCommand.java:310)
> at
> org.apache.oozie.command.wf.SignalXCommand.execute(SignalXCommand.java:64)
> at org.apache.oozie.command.XCommand.call(XCommand.java:277)
> at
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:326)
> at
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:255)
> at
> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:175)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:722)
> Caused by: <openjpa-2.1.0-r422266:1071316 fatal general error>
> org.apache.openjpa.persistence.PersistenceException: The transaction has been
> rolled back. See the nested exceptions for details on the errors that
> occurred.
> FailedObject: org.apache.oozie.WorkflowActionBean@1dadb1b6
> at
> org.apache.openjpa.kernel.BrokerImpl.newFlushException(BrokerImpl.java:2316)
> at org.apache.openjpa.kernel.BrokerImpl.flush(BrokerImpl.java:2153)
> at org.apache.openjpa.kernel.BrokerImpl.flushSafe(BrokerImpl.java:2051)
> at
> org.apache.openjpa.kernel.BrokerImpl.beforeCompletion(BrokerImpl.java:1969)
> at
> org.apache.openjpa.kernel.LocalManagedRuntime.commit(LocalManagedRuntime.java:81)
> at org.apache.openjpa.kernel.BrokerImpl.commit(BrokerImpl.java:1493)
> at
> org.apache.openjpa.kernel.DelegatingBroker.commit(DelegatingBroker.java:925)
> at
> org.apache.openjpa.persistence.EntityManagerImpl.commit(EntityManagerImpl.java:561)
> ... 10 more
> Caused by: <openjpa-2.1.0-r422266:1071316 fatal store error>
> org.apache.openjpa.persistence.EntityExistsException: Violation of PRIMARY
> KEY constraint 'PK__WF_ACTIO__3213E83FECE1688E'. Cannot insert duplicate key
> in object 'dbo.WF_ACTIONS'. The duplicate key value is
> (0000002-130822205833322-oozie-hdp-W@wc). {prepstmnt 1032423179 INSERT INTO
> WF_ACTIONS (id, conf, console_url, cred, data, error_code, error_message,
> external_child_ids, external_id, external_status, name, retries, stats,
> tracker_uri, transition, type, user_retry_count, user_retry_interval,
> user_retry_max, bean_type, end_time, execution_path, last_check_time,
> log_token, pending, pending_age, signal_value, sla_xml, start_time, status,
> wf_id) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,
> ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) [params=?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,
> ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?]} [code=2627,
> state=23000]
> FailedObject: org.apache.oozie.WorkflowActionBean@1dadb1b6
> at
> org.apache.openjpa.jdbc.sql.DBDictionary.narrow(DBDictionary.java:4854)
> at
> org.apache.openjpa.jdbc.sql.DBDictionary.newStoreException(DBDictionary.java:4829)
> at
> org.apache.openjpa.jdbc.sql.SQLExceptions.getStore(SQLExceptions.java:136)
> at
> org.apache.openjpa.jdbc.sql.SQLExceptions.getStore(SQLExceptions.java:78)
> at
> org.apache.openjpa.jdbc.kernel.PreparedStatementManagerImpl.flushAndUpdate(PreparedStatementManagerImpl.java:143)
> at
> org.apache.openjpa.jdbc.kernel.BatchingPreparedStatementManagerImpl.batchOrExecuteRow(BatchingPreparedStatementManagerImpl.java:101)
> at
> org.apache.openjpa.jdbc.kernel.BatchingPreparedStatementManagerImpl.flushAndUpdate(BatchingPreparedStatementManagerImpl.java:85)
> at
> org.apache.openjpa.jdbc.kernel.PreparedStatementManagerImpl.flushInternal(PreparedStatementManagerImpl.java:99)
> at
> org.apache.openjpa.jdbc.kernel.PreparedStatementManagerImpl.flush(PreparedStatementManagerImpl.java:87)
> at
> org.apache.openjpa.jdbc.kernel.ConstraintUpdateManager.flush(ConstraintUpdateManager.java:550)
> at
> org.apache.openjpa.jdbc.kernel.ConstraintUpdateManager.flush(ConstraintUpdateManager.java:107)
> at
> org.apache.openjpa.jdbc.kernel.BatchingConstraintUpdateManager.flush(BatchingConstraintUpdateManager.java:59)
> at
> org.apache.openjpa.jdbc.kernel.AbstractUpdateManager.flush(AbstractUpdateManager.java:103)
> at
> org.apache.openjpa.jdbc.kernel.AbstractUpdateManager.flush(AbstractUpdateManager.java:76)
> at
> org.apache.openjpa.jdbc.kernel.JDBCStoreManager.flush(JDBCStoreManager.java:742)
> at
> org.apache.openjpa.kernel.DelegatingStoreManager.flush(DelegatingStoreManager.java:131)
> ... 17 more
> Caused by: org.apache.openjpa.lib.jdbc.ReportingSQLException: Violation of
> PRIMARY KEY constraint 'PK__WF_ACTIO__3213E83FECE1688E'. Cannot insert
> duplicate key in object 'dbo.WF_ACTIONS'. The duplicate key value is
> (0000002-130822205833322-oozie-hdp-W@wc). {prepstmnt 1032423179 INSERT INTO
> WF_ACTIONS (id, conf, console_url, cred, data, error_code, error_message,
> external_child_ids, external_id, external_status, name, retries, stats,
> tracker_uri, transition, type, user_retry_count, user_retry_interval,
> user_retry_max, bean_type, end_time, execution_path, last_check_time,
> log_token, pending, pending_age, signal_value, sla_xml, start_time, status,
> wf_id) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,
> ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) [params=?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,
> ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?]} [code=2627,
> state=23000]
> at
> org.apache.openjpa.lib.jdbc.LoggingConnectionDecorator.wrap(LoggingConnectionDecorator.java:281)
> at
> org.apache.openjpa.lib.jdbc.LoggingConnectionDecorator.wrap(LoggingConnectionDecorator.java:257)
> at
> org.apache.openjpa.lib.jdbc.LoggingConnectionDecorator.access$1000(LoggingConnectionDecorator.java:72)
> at
> org.apache.openjpa.lib.jdbc.LoggingConnectionDecorator$LoggingConnection$LoggingPreparedStatement.executeUpdate(LoggingConnectionDecorator.java:1199)
> at
> org.apache.openjpa.lib.jdbc.DelegatingPreparedStatement.executeUpdate(DelegatingPreparedStatement.java:291)
> at
> org.apache.openjpa.jdbc.kernel.JDBCStoreManager$CancelPreparedStatement.executeUpdate(JDBCStoreManager.java:1776)
> at
> org.apache.openjpa.jdbc.kernel.PreparedStatementManagerImpl.executeUpdate(PreparedStatementManagerImpl.java:267)
> at
> org.apache.openjpa.jdbc.kernel.PreparedStatementManagerImpl.flushAndUpdate(PreparedStatementManagerImpl.java:118)
> ... 28 more
> 2013-08-22 23:00:59,170 WARN CallableQueueService$CompositeCallable:542 -
> USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] exception callable
> [signal], E0607: Other error in operation [signal], The transaction has been
> rolled back. See the nested exceptions for details on the errors that
> occurred.
> org.apache.oozie.command.CommandException: E0607: Other error in operation
> [signal], The transaction has been rolled back. See the nested exceptions
> for details on the errors that occurred.
> at org.apache.oozie.command.XCommand.call(XCommand.java:317)
> at
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:326)
> at
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:255)
> at
> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:175)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:722)
> Caused by: <openjpa-2.1.0-r422266:1071316 fatal store error>
> org.apache.openjpa.persistence.RollbackException: The transaction has been
> rolled back. See the nested exceptions for details on the errors that
> occurred.
> FailedObject: org.apache.oozie.WorkflowActionBean@1dadb1b6
> at
> org.apache.openjpa.persistence.EntityManagerImpl.commit(EntityManagerImpl.java:585)
> at org.apache.oozie.service.JPAService.execute(JPAService.java:217)
> at
> org.apache.oozie.command.wf.SignalXCommand.execute(SignalXCommand.java:310)
> at
> org.apache.oozie.command.wf.SignalXCommand.execute(SignalXCommand.java:64)
> at org.apache.oozie.command.XCommand.call(XCommand.java:277)
> ... 6 more
> Caused by: <openjpa-2.1.0-r422266:1071316 fatal general error>
> org.apache.openjpa.persistence.PersistenceException: The transaction has been
> rolled back. See the nested exceptions for details on the errors that
> occurred.
> FailedObject: org.apache.oozie.WorkflowActionBean@1dadb1b6
> at
> org.apache.openjpa.kernel.BrokerImpl.newFlushException(BrokerImpl.java:2316)
> at org.apache.openjpa.kernel.BrokerImpl.flush(BrokerImpl.java:2153)
> at org.apache.openjpa.kernel.BrokerImpl.flushSafe(BrokerImpl.java:2051)
> at
> org.apache.openjpa.kernel.BrokerImpl.beforeCompletion(BrokerImpl.java:1969)
> at
> org.apache.openjpa.kernel.LocalManagedRuntime.commit(LocalManagedRuntime.java:81)
> at org.apache.openjpa.kernel.BrokerImpl.commit(BrokerImpl.java:1493)
> at
> org.apache.openjpa.kernel.DelegatingBroker.commit(DelegatingBroker.java:925)
> at
> org.apache.openjpa.persistence.EntityManagerImpl.commit(EntityManagerImpl.java:561)
> ... 10 more
> Caused by: <openjpa-2.1.0-r422266:1071316 fatal store error>
> org.apache.openjpa.persistence.EntityExistsException: Violation of PRIMARY
> KEY constraint 'PK__WF_ACTIO__3213E83FECE1688E'. Cannot insert duplicate key
> in object 'dbo.WF_ACTIONS'. The duplicate key value is
> (0000002-130822205833322-oozie-hdp-W@wc). {prepstmnt 1032423179 INSERT INTO
> WF_ACTIONS (id, conf, console_url, cred, data, error_code, error_message,
> external_child_ids, external_id, external_status, name, retries, stats,
> tracker_uri, transition, type, user_retry_count, user_retry_interval,
> user_retry_max, bean_type, end_time, execution_path, last_check_time,
> log_token, pending, pending_age, signal_value, sla_xml, start_time, status,
> wf_id) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,
> ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) [params=?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,
> ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?]} [code=2627,
> state=23000]
> FailedObject: org.apache.oozie.WorkflowActionBean@1dadb1b6
> at
> org.apache.openjpa.jdbc.sql.DBDictionary.narrow(DBDictionary.java:4854)
> at
> org.apache.openjpa.jdbc.sql.DBDictionary.newStoreException(DBDictionary.java:4829)
> at
> org.apache.openjpa.jdbc.sql.SQLExceptions.getStore(SQLExceptions.java:136)
> at
> org.apache.openjpa.jdbc.sql.SQLExceptions.getStore(SQLExceptions.java:78)
> at
> org.apache.openjpa.jdbc.kernel.PreparedStatementManagerImpl.flushAndUpdate(PreparedStatementManagerImpl.java:143)
> at
> org.apache.openjpa.jdbc.kernel.BatchingPreparedStatementManagerImpl.batchOrExecuteRow(BatchingPreparedStatementManagerImpl.java:101)
> at
> org.apache.openjpa.jdbc.kernel.BatchingPreparedStatementManagerImpl.flushAndUpdate(BatchingPreparedStatementManagerImpl.java:85)
> at
> org.apache.openjpa.jdbc.kernel.PreparedStatementManagerImpl.flushInternal(PreparedStatementManagerImpl.java:99)
> at
> org.apache.openjpa.jdbc.kernel.PreparedStatementManagerImpl.flush(PreparedStatementManagerImpl.java:87)
> at
> org.apache.openjpa.jdbc.kernel.ConstraintUpdateManager.flush(ConstraintUpdateManager.java:550)
> at
> org.apache.openjpa.jdbc.kernel.ConstraintUpdateManager.flush(ConstraintUpdateManager.java:107)
> at
> org.apache.openjpa.jdbc.kernel.BatchingConstraintUpdateManager.flush(BatchingConstraintUpdateManager.java:59)
> at
> org.apache.openjpa.jdbc.kernel.AbstractUpdateManager.flush(AbstractUpdateManager.java:103)
> at
> org.apache.openjpa.jdbc.kernel.AbstractUpdateManager.flush(AbstractUpdateManager.java:76)
> at
> org.apache.openjpa.jdbc.kernel.JDBCStoreManager.flush(JDBCStoreManager.java:742)
> at
> org.apache.openjpa.kernel.DelegatingStoreManager.flush(DelegatingStoreManager.java:131)
> ... 17 more
> Caused by: org.apache.openjpa.lib.jdbc.ReportingSQLException: Violation of
> PRIMARY KEY constraint 'PK__WF_ACTIO__3213E83FECE1688E'. Cannot insert
> duplicate key in object 'dbo.WF_ACTIONS'. The duplicate key value is
> (0000002-130822205833322-oozie-hdp-W@wc). {prepstmnt 1032423179 INSERT INTO
> WF_ACTIONS (id, conf, console_url, cred, data, error_code, error_message,
> external_child_ids, external_id, external_status, name, retries, stats,
> tracker_uri, transition, type, user_retry_count, user_retry_interval,
> user_retry_max, bean_type, end_time, execution_path, last_check_time,
> log_token, pending, pending_age, signal_value, sla_xml, start_time, status,
> wf_id) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,
> ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) [params=?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?,
> ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?]} [code=2627,
> state=23000]
> at
> org.apache.openjpa.lib.jdbc.LoggingConnectionDecorator.wrap(LoggingConnectionDecorator.java:281)
> at
> org.apache.openjpa.lib.jdbc.LoggingConnectionDecorator.wrap(LoggingConnectionDecorator.java:257)
> at
> org.apache.openjpa.lib.jdbc.LoggingConnectionDecorator.access$1000(LoggingConnectionDecorator.java:72)
> at
> org.apache.openjpa.lib.jdbc.LoggingConnectionDecorator$LoggingConnection$LoggingPreparedStatement.executeUpdate(LoggingConnectionDecorator.java:1199)
> at
> org.apache.openjpa.lib.jdbc.DelegatingPreparedStatement.executeUpdate(DelegatingPreparedStatement.java:291)
> at
> org.apache.openjpa.jdbc.kernel.JDBCStoreManager$CancelPreparedStatement.executeUpdate(JDBCStoreManager.java:1776)
> at
> org.apache.openjpa.jdbc.kernel.PreparedStatementManagerImpl.executeUpdate(PreparedStatementManagerImpl.java:267)
> at
> org.apache.openjpa.jdbc.kernel.PreparedStatementManagerImpl.flushAndUpdate(PreparedStatementManagerImpl.java:118)
> ... 28 more
> {noformat}
> Possible problem may be that when oozie is inserting a new object, the JPA
> entity manager needs to hear back from the connection to know the commit
> succeeded.
> Sometimes when commit indeed succeeded but connection was lost, the JPA
> entityManager will keep retrying to persist the same object since it assumes
> the commit failed, therefore you see the "primary key constraint violation
> exception" in the stacktrace. The workflow status remains in running state
> until all of its child actions turn "SUCCEEDED". Since one of the child
> action fails to be created, the workflow will always remain in running state.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)