Peeyush Gupta created ASTERIXDB-3557:
----------------------------------------
Summary: Failure in reading atomic txn log file results in crash
loop
Key: ASTERIXDB-3557
URL: https://issues.apache.org/jira/browse/ASTERIXDB-3557
Project: Apache AsterixDB
Issue Type: Bug
Components: TX - Transactions
Reporter: Peeyush Gupta
On failures to deserialize an atomic transaction log file during recovery, the
CC enters a crash loop. In those cases, we need to delete the invalid files and
continue processing.
Sample failures:
{{}}
{noformat}
2025-01-28T11:18:30.840+00:00 ERRO CBAS.replication.NcLifecycleCoordinator
[Executor-13:ClusterController] Node b420f4d7c136b5e56bda9374743cde5a failed to
complete startup
org.apache.asterix.common.exceptions.ACIDException:
java.lang.NullPointerException: Cannot read the array length because "bytes" is
null
at
org.apache.asterix.transaction.management.service.transaction.TransactionManager.rollbackMetadataTransactionsWithoutWAL(TransactionManager.java:225)
~[asterix-transactions-1.0.3-2467.jar:1.0.3-2467]
at
org.apache.asterix.app.nc.task.LocalStorageCleanupTask.perform(LocalStorageCleanupTask.java:51)
~[asterix-app-1.0.3-2467.jar:1.0.3-2467]
at
org.apache.asterix.app.replication.message.RegistrationTasksResponseMessage.handle(RegistrationTasksResponseMessage.java:63)
~[asterix-app-1.0.3-2467.jar:1.0.3-2467]
at
org.apache.asterix.messaging.NCMessageBroker.lambda$receivedMessage$0(NCMessageBroker.java:108)
~[asterix-app-1.0.3-2467.jar:1.0.3-2467]
at
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
~[?:?]
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
~[?:?]
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
[?:?]
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
[?:?]
at java.base/java.lang.Thread.run(Thread.java:840) [?:?]
Caused by: java.lang.NullPointerException
at java.base/java.lang.String.<init>(String.java:1437) ~[?:?]
at
org.apache.asterix.transaction.management.service.transaction.TransactionManager.rollbackMetadataTransactionsWithoutWAL(TransactionManager.java:215)
~[asterix-transactions-1.0.3-2467.jar:1.0.3-2467]
... 8 more{noformat}
{{}}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)