Scan of metadata table failed because tablet closed
---------------------------------------------------

                 Key: ACCUMULO-331
                 URL: https://issues.apache.org/jira/browse/ACCUMULO-331
             Project: Accumulo
          Issue Type: Bug
          Components: tserver
         Environment: Running random walk test on 10 node cluster
            Reporter: Keith Turner
            Assignee: Keith Turner
             Fix For: 1.4.0


The security random walk test failed to drop a table.

{noformat}
19 05:56:04,122 [randomwalk.Framework] ERROR: Error during random walk
java.lang.Exception: Error running node Security.xml
        at 
org.apache.accumulo.server.test.randomwalk.Module.visit(Module.java:253)
        at 
org.apache.accumulo.server.test.randomwalk.Framework.run(Framework.java:61)
        at 
org.apache.accumulo.server.test.randomwalk.Framework.main(Framework.java:114)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.accumulo.start.Main$1.run(Main.java:89)
        at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.Exception: Error running node security.TableOp
        at 
org.apache.accumulo.server.test.randomwalk.Module.visit(Module.java:253)
        at 
org.apache.accumulo.server.test.randomwalk.Module.visit(Module.java:249)
        ... 8 more
Caused by: org.apache.accumulo.core.client.AccumuloException: Internal error 
processing waitForTableOperation
        at 
org.apache.accumulo.core.client.admin.TableOperationsImpl.doTableOperation(TableOperationsImpl.java:293)
        at 
org.apache.accumulo.core.client.admin.TableOperationsImpl.doTableOperation(TableOperationsImpl.java:261)
        at 
org.apache.accumulo.core.client.admin.TableOperationsImpl.delete(TableOperationsImpl.java:508)
        at 
org.apache.accumulo.server.test.randomwalk.security.DropTable.dropTable(DropTable.java:66)
        at 
org.apache.accumulo.server.test.randomwalk.security.TableOp.visit(TableOp.java:204)
        at 
org.apache.accumulo.server.test.randomwalk.Module.visit(Module.java:249)
        ... 9 more
Caused by: org.apache.thrift.TApplicationException: Internal error processing 
waitForTableOperation
        at 
org.apache.thrift.TApplicationException.read(TApplicationException.java:108)
        at 
org.apache.accumulo.core.master.thrift.MasterClientService$Client.recv_waitForTableOperation(MasterClientService.java:684)
        at 
org.apache.accumulo.core.master.thrift.MasterClientService$Client.waitForTableOperation(MasterClientService.java:665)
        at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at cloudtrace.instrument.thrift.TraceWrap$2.invoke(TraceWrap.java:83)
        at $Proxy1.waitForTableOperation(Unknown Source)
        at 
org.apache.accumulo.core.client.admin.TableOperationsImpl.waitForTableOperation(TableOperationsImpl.java:233)
        at 
org.apache.accumulo.core.client.admin.TableOperationsImpl.doTableOperation(TableOperationsImpl.java:275)
        ... 14 more
{noformat}

Looking in the master logs see the following indicating a scan failed while 
trying to delete the table.

{noformat}
19 05:56:04,098 [thrift.MasterClientService$Processor] ERROR: Internal error 
processing waitForTableOperation
java.lang.RuntimeException: 
org.apache.accumulo.core.client.impl.AccumuloServerException: Error on server 
xxx.xxx.xxx.xxx:9997
        at 
org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.hasNext(TabletServerBatchReaderIterator.java:188)
        at 
org.apache.accumulo.server.master.state.MetaDataTableScanner.hasNext(MetaDataTableScanner.java:97)
        at 
org.apache.accumulo.server.master.tableOps.CleanUp.isReady(DeleteTable.java:94)
        at 
org.apache.accumulo.server.master.tableOps.CleanUp.isReady(DeleteTable.java:55)
        at 
org.apache.accumulo.server.master.tableOps.TraceRepo.isReady(TraceRepo.java:50)
        at 
org.apache.accumulo.server.fate.Fate$TransactionRunner.run(Fate.java:62)
        at 
org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
        at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.accumulo.core.client.impl.AccumuloServerException: Error 
on server xxx.xxx.xxx.xxx:9997
        at 
org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:579)
        at 
org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:335)
        at cloudtrace.instrument.TraceRunnable.run(TraceRunnable.java:47)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        ... 1 more
Caused by: org.apache.thrift.TApplicationException: Internal error processing 
startMultiScan
        at 
org.apache.thrift.TApplicationException.read(TApplicationException.java:108)
        at 
org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_startMultiScan(TabletClientService.java:298)
        at 
org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.startMultiScan(TabletClientService.java:274)
        at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at cloudtrace.instrument.thrift.TraceWrap$2.invoke(TraceWrap.java:83)
        at $Proxy4.startMultiScan(Unknown Source)
        at 
org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:539)
        ... 5 more
{noformat}


Looking on the tablet server where the scan failed, see the following

{noformat}

19 05:56:04,062 [tabletserver.Tablet] DEBUG: initiateClose(saveState=true 
queueMinC=false disableWrites=false) !0;~;!0<
19 05:56:04,062 [tabletserver.Tablet] DEBUG: completeClose(saveState=true 
completeClose=true) !0;~;!0<
19 05:56:04,063 [problems.ProblemReports] DEBUG: Filing problem report !0 
FILE_READ /accumulo/tables/!0/table_info/F0002xng.rf

19 05:56:04,072 [tabletserver.Tablet] TABLET_HIST: !0;~;!0< closed
19 05:56:04,072 [tabletserver.TabletServer] DEBUG: Unassigning 
!0;~;!0<@(null,192.168.117.13:9997[134d7425fc5940d],null)
19 05:56:04,073 [tabletserver.TabletServer] WARN : lookup failed for tablet 
!0;~;!0<
java.io.EOFException
        at java.io.DataInputStream.readByte(DataInputStream.java:250)
        at 
org.apache.accumulo.core.file.rfile.RelativeKey.fastSkip(RelativeKey.java:249)
        at 
org.apache.accumulo.core.file.rfile.RFile$LocalityGroupReader._seek(RFile.java:677)
        at 
org.apache.accumulo.core.file.rfile.RFile$LocalityGroupReader.seek(RFile.java:614)
        at 
org.apache.accumulo.core.iterators.WrappingIterator.seek(WrappingIterator.java:92)
        at 
org.apache.accumulo.core.iterators.SkippingIterator.seek(SkippingIterator.java:37)
        at 
org.apache.accumulo.core.iterators.system.ColumnFamilySkippingIterator.seek(ColumnFamilySkippingIterator.java:125)
        at 
org.apache.accumulo.core.file.rfile.RFile$Reader.seek(RFile.java:1020)
        at 
org.apache.accumulo.server.problems.ProblemReportingIterator.seek(ProblemReportingIterator.java:94)
        at 
org.apache.accumulo.core.iterators.system.MultiIterator.seek(MultiIterator.java:105)
        at 
org.apache.accumulo.core.iterators.WrappingIterator.seek(WrappingIterator.java:92)
        at 
org.apache.accumulo.core.iterators.system.DeletingIterator.seek(DeletingIterator.java:67)
        at 
org.apache.accumulo.core.iterators.WrappingIterator.seek(WrappingIterator.java:92)
        at org.apache.accumulo.core.iterators.Filter.seek(Filter.java:64)
        at 
org.apache.accumulo.core.iterators.WrappingIterator.seek(WrappingIterator.java:92)
        at org.apache.accumulo.core.iterators.Filter.seek(Filter.java:64)
        at 
org.apache.accumulo.core.iterators.WrappingIterator.seek(WrappingIterator.java:92)
        at 
org.apache.accumulo.core.iterators.user.VersioningIterator.seek(VersioningIterator.java:89)
        at 
org.apache.accumulo.core.iterators.user.WholeRowIterator.seek(WholeRowIterator.java:240)
        at 
org.apache.accumulo.core.iterators.WrappingIterator.seek(WrappingIterator.java:92)
        at 
org.apache.accumulo.core.iterators.SkippingIterator.seek(SkippingIterator.java:37)
        at 
org.apache.accumulo.core.iterators.system.SourceSwitchingIterator.readNext(SourceSwitchingIterator.java:116)
        at 
org.apache.accumulo.core.iterators.system.SourceSwitchingIterator.seek(SourceSwitchingIterator.java:168)
        at 
org.apache.accumulo.server.tabletserver.Tablet.lookup(Tablet.java:1607)
        at 
org.apache.accumulo.server.tabletserver.Tablet.lookup(Tablet.java:1710)
        at 
org.apache.accumulo.server.tabletserver.TabletServer$ThriftClientHandler$LookupTask.run(TabletServer.java:988)
        at cloudtrace.instrument.TraceRunnable.run(TraceRunnable.java:47)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)

19 05:56:04,120 [tabletserver.TabletServer] INFO : Loading tablet !0;~;!0<

19 05:56:04,128 [tabletserver.Tablet] TABLET_HIST: !0;~;!0< opened
{noformat}

The metadata tablet were being frequently loaded and unloaded as descried in 
ACCUMULO-329 when this happened.  

The file in question was major compacted away on another tablet server. Shortly 
after this.

{noformat}
19 05:57:54,187 [tabletserver.Tablet] DEBUG: Starting MajC !0;~;!0< 
[/table_info/A0002xm4.rf, /table_info/C0002xz3.rf, /table_info/C0002y0d.rf, 
/table_info/F0002xng.rf, /table_info/M0002xzv.rf, /table_info/M0002y6z.rf] --> 
/table_info/A0002y0e.rf_tmp

19 05:57:54,309 [tabletserver.Tablet] TABLET_HIST: !0;~;!0< MajC 
[/table_info/A0002xm4.rf, /table_info/C0002xz3.rf, /table_info/C0002y0d.rf, 
/table_info/F0002xng.rf, /table_info/M0002xzv.rf, /table_info/M0002y6z.rf] --> 
/table_info/A0002y0e.rf
{noformat}

{noformat}
19 06:10:29,546 [gc.SimpleGarbageCollector] DEBUG: Deleting 
/accumulo/tables/!0/table_info/F0002xng.rf
{noformat}


The file was garbage collected later.  The GC and Majc show that the file 
existed while the EOF happened.  So this is probably either a bug in RFile fast 
skip or a bug related to the tablet closing. Probably related to the close w/ 
the close temporal relationship.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to