[
https://issues.apache.org/jira/browse/ACCUMULO-1998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13865878#comment-13865878
]
ASF subversion and git services commented on ACCUMULO-1998:
-----------------------------------------------------------
Commit 443cba7a7a3838b547880f1c49f2a9e0128692cd in branch refs/heads/master
from [~vines]
[ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=443cba7 ]
ACCUMULO-1998
All encrypted walog events are now individual blocked on disk. This leads to an
additional maxBlockSize parameter (mostly to handle OOM from mismatched
crypto). Additionally, because of this behavior, as well as PKCS5 behavior, I
have turned off all padding on the default crypto configs and padding should
not be used as it can cause data loss in walogs. I have hammered 5 instances on
and off every minute for 22 hours and counting with no related issues, so I
deem it a fix.
> Encrypted WALogs seem to be excessively buffering
> -------------------------------------------------
>
> Key: ACCUMULO-1998
> URL: https://issues.apache.org/jira/browse/ACCUMULO-1998
> Project: Accumulo
> Issue Type: Bug
> Reporter: Michael Allen
> Assignee: John Vines
> Priority: Blocker
> Fix For: 1.6.0
>
> Attachments:
> 0001-ACCUMULO-1998-Working-around-the-cipher-s-buffer-by-.patch,
> 0001-ACCUMULO-1998-forcing-Buffered-crypto-stream-to-flus.patch,
> 0001-ACCUMULO-1998.patch,
> 0002-ACCUMULO-1998-forcing-Buffered-crypto-stream-to-flus.patch,
> 0002-ACCUMULO-1998.patch,
> 0003-ACCUMULO-1998-forcing-Buffered-crypto-stream-to-flus.patch,
> 0004-ACCUMULO-1998-forcing-Buffered-crypto-stream-to-flus.patch
>
>
> The reproduction steps around this are a little bit fuzzy but basically we
> ran a moderate workload against a 1.6.0 server. Encryption happened to be
> turned on but that doesn't seem to be germane to the problem. After doing a
> moderate amount of work, Accumulo is refusing to start up, spewing this error
> over and over to the log:
> {noformat}
> 2013-12-10 10:23:02,529 [tserver.TabletServer] WARN : exception while doing
> multi-scan
> java.lang.RuntimeException: java.io.IOException: Failed to open
> hdfs://10.10.1.115:9000/accumulo/tables/!0/table_info/A000042x.rf
> at
> org.apache.accumulo.tserver.TabletServer$ThriftClientHandler$LookupTask.run(TabletServer.java:1125)
> at
> org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
> at
> org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47)
> at
> org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: java.io.IOException: Failed to open
> hdfs://10.10.1.115:9000/accumulo/tables/!0/table_info/A000042x.rf
> at
> org.apache.accumulo.tserver.FileManager.reserveReaders(FileManager.java:333)
> at
> org.apache.accumulo.tserver.FileManager.access$500(FileManager.java:58)
> at
> org.apache.accumulo.tserver.FileManager$ScanFileManager.openFiles(FileManager.java:478)
> at
> org.apache.accumulo.tserver.FileManager$ScanFileManager.openFileRefs(FileManager.java:466)
> at
> org.apache.accumulo.tserver.FileManager$ScanFileManager.openFiles(FileManager.java:486)
> at
> org.apache.accumulo.tserver.Tablet$ScanDataSource.createIterator(Tablet.java:2027)
> at
> org.apache.accumulo.tserver.Tablet$ScanDataSource.iterator(Tablet.java:1989)
> at
> org.apache.accumulo.core.iterators.system.SourceSwitchingIterator.seek(SourceSwitchingIterator.java:163)
> at org.apache.accumulo.tserver.Tablet.lookup(Tablet.java:1565)
> at org.apache.accumulo.tserver.Tablet.lookup(Tablet.java:1672)
> at
> org.apache.accumulo.tserver.TabletServer$ThriftClientHandler$LookupTask.run(TabletServer.java:1114)
> ... 6 more
> Caused by: java.io.FileNotFoundException: File does not exist:
> /accumulo/tables/!0/table_info/A000042x.rf
> at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.fetchLocatedBlocks(DFSClient.java:2006)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1975)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1967)
> at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:735)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:165)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:436)
> at
> org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getBCFile(CachableBlockFile.java:256)
> at
> org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.access$000(CachableBlockFile.java:143)
> at
> org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader$MetaBlockLoader.get(CachableBlockFile.java:212)
> at
> org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getBlock(CachableBlockFile.java:313)
> at
> org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getMetaBlock(CachableBlockFile.java:367)
> at
> org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getMetaBlock(CachableBlockFile.java:143)
> at
> org.apache.accumulo.core.file.rfile.RFile$Reader.<init>(RFile.java:825)
> at
> org.apache.accumulo.core.file.rfile.RFileOperations.openReader(RFileOperations.java:79)
> at
> org.apache.accumulo.core.file.DispatchingFileFactory.openReader(FileOperations.java:119)
> at
> org.apache.accumulo.tserver.FileManager.reserveReaders(FileManager.java:314)
> ... 16 more
> {noformat}
> Here's some other pieces of context:
> HDFS contents:
> {noformat}
> ubuntu@ip-10-10-1-115:/data0/logs/accumulo$ hadoop fs -lsr /accumulo/tables/
> drwxr-xr-x - accumulo hadoop 0 2013-12-10 00:32 /accumulo/tables/!0
> drwxr-xr-x - accumulo hadoop 0 2013-12-10 01:06
> /accumulo/tables/!0/default_tablet
> drwxr-xr-x - accumulo hadoop 0 2013-12-10 10:49
> /accumulo/tables/!0/table_info
> -rw-r--r-- 5 accumulo hadoop 1698 2013-12-10 00:34
> /accumulo/tables/!0/table_info/F0000000.rf
> -rw-r--r-- 5 accumulo hadoop 43524 2013-12-10 01:53
> /accumulo/tables/!0/table_info/F000062q.rf
> drwxr-xr-x - accumulo hadoop 0 2013-12-10 00:32 /accumulo/tables/+r
> drwxr-xr-x - accumulo hadoop 0 2013-12-10 10:45
> /accumulo/tables/+r/root_tablet
> -rw-r--r-- 5 accumulo hadoop 2070 2013-12-10 10:45
> /accumulo/tables/+r/root_tablet/A0000738.rf
> drwxr-xr-x - accumulo hadoop 0 2013-12-10 00:33 /accumulo/tables/1
> drwxr-xr-x - accumulo hadoop 0 2013-12-10 00:33
> /accumulo/tables/1/default_tablet
> {noformat}
> ZooKeeper entries
> {noformat}
> [zk: localhost:2181(CONNECTED) 6] get
> /accumulo/371cfa3e-fe96-4a50-92e9-da7572589ffa/root_tablet/dir
> hdfs://10.10.1.115:9000/accumulo/tables/+r/root_tablet
> cZxid = 0x1b
> ctime = Tue Dec 10 00:32:56 EST 2013
> mZxid = 0x1b
> mtime = Tue Dec 10 00:32:56 EST 2013
> pZxid = 0x1b
> cversion = 0
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 54
> numChildren = 0
> {noformat}
> I'm going to preserve the state of this machine in HDFS for a while but not
> forever, so if there are other pieces of context people need, let me know.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)