[
https://issues.apache.org/jira/browse/ASTERIXDB-1708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15609184#comment-15609184
]
Jianfeng Jia commented on ASTERIXDB-1708:
-----------------------------------------
I can add an interesting log in my one node test cluster. I think the restart
(or recovery) is the culprit for this kind of problem.
{code}
Listening for transport dt_socket at address: 8001
Oct 26, 2016 10:48:24 AM org.apache.hyracks.control.nc.NCDriver main
SEVERE: Setting uncaught exception handler
org.apache.hyracks.api.lifecycle.LifeCycleComponentManager@49070868
Oct 26, 2016 10:48:24 AM org.apache.hyracks.control.nc.NodeControllerService
start
INFO: Starting NodeControllerService
Oct 26, 2016 10:48:24 AM
org.apache.asterix.hyracks.bootstrap.NCApplicationEntryPoint start
INFO: Starting Asterix node controller: ur_ur
Oct 26, 2016 10:48:25 AM
org.apache.asterix.transaction.management.service.logging.LogManager
initializeLogAnchor
INFO: log file Id: 16, offset: 2069172273
Oct 26, 2016 10:48:25 AM
org.apache.asterix.transaction.management.service.logging.LogManager
initializeLogManager
INFO: LogManager starts logging in LSN: 36428910641
Oct 26, 2016 10:48:25 AM
org.apache.asterix.hyracks.bootstrap.NCApplicationEntryPoint start
INFO: System is in a state: CORRUPTED
Oct 26, 2016 10:48:25 AM
org.apache.asterix.transaction.management.service.recovery.RecoveryManager
startRecovery
INFO: starting recovery ...
Oct 26, 2016 10:48:41 AM
org.apache.asterix.transaction.management.service.recovery.RecoveryManager
startRecoverysAnalysisPhase
INFO: Logs analysis phase completed.
Oct 26, 2016 10:48:41 AM
org.apache.asterix.transaction.management.service.recovery.RecoveryManager
startRecoverysAnalysisPhase
INFO: Analysis log count update/entityCommit/jobCommit/abort =
14960309/5079520/17/0
{code}
> Rollback failure at scale
> -------------------------
>
> Key: ASTERIXDB-1708
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-1708
> Project: Apache AsterixDB
> Issue Type: Bug
> Reporter: Ian Maxon
> Assignee: Ian Maxon
>
> Seems that transaction rollback can fail at certain points. This happened
> with the same file ID on a cluster of 5 nodes which is an interesting
> coincidence.
> org.apache.asterix.common.exceptions.ACIDException: java.io.IOException: Log
> file with id(37) was not found. Requested LSN: 80892085216
> at
> org.apache.asterix.transaction.management.service.logging.LogReader.getLogFile(LogReader.java:293)
> at
> org.apache.asterix.transaction.management.service.logging.LogReader.initializeScan(LogReader.java:76)
> at
> org.apache.asterix.transaction.management.service.recovery.RecoveryManager.rollbackTransaction(RecoveryManager.java:734)
> at
> org.apache.asterix.transaction.management.service.transaction.TransactionManager.abortTransaction(TransactionManager.java:64)
> at
> org.apache.asterix.transaction.management.service.transaction.TransactionManager.completedTransaction(TransactionManager.java:130)
> at
> org.apache.asterix.runtime.job.listener.JobEventListenerFactory$1.jobletFinish(JobEventListenerFactory.java:58)
> at org.apache.hyracks.control.nc.Joblet.performCleanup(Joblet.java:318)
> at org.apache.hyracks.control.nc.Joblet.cleanup(Joblet.java:310)
> at
> org.apache.hyracks.control.nc.work.CleanupJobletWork.run(CleanupJobletWork.java:67)
> at
> org.apache.hyracks.control.common.work.WorkQueue$WorkerThread.run(WorkQueue.java:127)
> Caused by: java.io.IOException: Log file with id(37) was not found. Requested
> LSN: 80892085216
> at
> org.apache.asterix.transaction.management.service.logging.LogManager.getLogFile(LogManager.java:544)
> at
> org.apache.asterix.transaction.management.service.logging.LogReader.getLogFile(LogReader.java:290)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)