[ 
https://issues.apache.org/jira/browse/ASTERIXDB-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14935764#comment-14935764
 ] 

Murtadha Hubail commented on ASTERIXDB-1114:
--------------------------------------------

The issue was happening when a secondary index is registered before the primary 
index (dataset) is. This would happen in the case of creating a secondary index 
on a closed dataset. In that case, the newly created secondary index instance 
is passed a null reference to the DatasetInfo since it was not registered yet. 
When that index is flushed during a checkpoint on NC shutdown, the NPE happens. 
This wouldn't happen in the case of primary index because we were resetting its 
DatasetInfo reference during index registration.

The implemented fix guarantees that the passed DatasetInfo is never null to any 
index.

Current recovery test cases always access the primary index first, so the issue 
is not triggered. Since the issue happens on NC shutdown, there is no good way 
to catch it. I have written an ugly test case that reproduces the issue and 
verified that it fails before the fix was merged and doesn't afterwards. The 
test case verifies that the NC was shutdown successfully by checking no 
NCDriver processes are running. However, after thinking about it, this 
verification might produce many false positives since an NCDriver process might 
still be running due to a failure in a previous test case (since managix stop 
doesn't necessarily succeed), so I rather not merge it.

Since the current fix eliminates the possibility of getting a null DatasetInfo, 
I think this issue is fixed. When we have a more mature test framework, a test 
case could be added to force the system to flush an index without having to 
shutdown the NC.

> managix stop [instance_name] throws NullPointerException during checkpoint
> --------------------------------------------------------------------------
>
>                 Key: ASTERIXDB-1114
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-1114
>             Project: Apache AsterixDB
>          Issue Type: Bug
>            Reporter: Young-Seok Kim
>            Assignee: Murtadha Hubail
>
> When AsterixDB instance is stopped after inserting a record, checkpoint from 
> the recovery manager throws NullPointerException, which doesn't make NCDriver 
> stop. 
> The exception is shown below:
> INFO: Stopping NodeControllerService
> java.lang.NullPointerException
>     at 
> org.apache.asterix.common.context.BaseOperationTracker.beforeOperation(BaseOperationTracker.java:45)
>     at 
> org.apache.hyracks.storage.am.lsm.common.impls.LSMHarness.enterComponents(LSMHarness.java:178)
>     at 
> org.apache.hyracks.storage.am.lsm.common.impls.LSMHarness.getAndEnterComponents(LSMHarness.java:113)
>     at 
> org.apache.hyracks.storage.am.lsm.common.impls.LSMHarness.scheduleFlush(LSMHarness.java:376)
>     at 
> org.apache.hyracks.storage.am.lsm.common.impls.LSMTreeIndexAccessor.scheduleFlush(LSMTreeIndexAccessor.java:122)
>     at 
> org.apache.asterix.common.context.DatasetLifecycleManager.flushAndWaitForIO(DatasetLifecycleManager.java:237)
>     at 
> org.apache.asterix.common.context.DatasetLifecycleManager.flushDatasetOpenIndexes(DatasetLifecycleManager.java:518)
>     at 
> org.apache.asterix.common.context.DatasetLifecycleManager.flushAllDatasets(DatasetLifecycleManager.java:439)
>     at 
> org.apache.asterix.transaction.management.service.recovery.RecoveryManager.checkpoint(RecoveryManager.java:379)
>     at 
> org.apache.asterix.hyracks.bootstrap.NCApplicationEntryPoint.stop(NCApplicationEntryPoint.java:138)
>     at 
> org.apache.hyracks.control.nc.NodeControllerService.stop(NodeControllerService.java:343)
>     at org.apache.hyracks.control.nc.NCDriver$1.run(NCDriver.java:53)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to