[
https://issues.apache.org/jira/browse/ACCUMULO-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Keith Turner updated ACCUMULO-2766:
-----------------------------------
Attachment: ACCUMULO-2677-1.patch
This problem seems much worse than I thought. ACCUMULO-2677-1.patch is a
potential fix to this problem. I have been running experiments using
[mutslam|https://github.com/keith-turner/mutslam] a little utility I created to
test group commit performance. This patch is making a dramatic difference in
performance.
One of the experiments that mutslam runs is the following :
* create 128 threads
* each thread has a batch writer (w/ one thread)
* each thread writes 100 mutations, flushing its batchwriter after each
mutation
W/o the patch this test takes 500 to 580 seconds. With ACCUMULO-2677-1.patch
the test takes 23 to 25 seconds. While running the test I set
{{tserver.server.threads.minimum=128}} in {{accumulo-site.xml}}. I ran these
experiments on a single node w/ 16 cores (hyperthreaded) w/ hadoop-2.2.0.
I am still experimenting.
> Single walog operation may wait for multiple hsync calls
> --------------------------------------------------------
>
> Key: ACCUMULO-2766
> URL: https://issues.apache.org/jira/browse/ACCUMULO-2766
> Project: Accumulo
> Issue Type: Bug
> Components: tserver
> Affects Versions: 1.5.0, 1.5.1, 1.6.0
> Reporter: Keith Turner
> Assignee: Keith Turner
> Labels: performance
> Fix For: 1.5.2, 1.6.1
>
> Attachments: ACCUMULO-2677-1.patch
>
>
> While looking into slow {{hsync}} calls, I noticed an oddity in the way
> Accumulo processes syncs. Specifically the way {{closeLock}} is used in
> {{DfsLogger}}, it seems like the following situation could occur.
>
> # thread B starts executing DfsLogger.LogSyncingTask.run()
> # thread 1 enters DfsLogger.logFileData()
> # thread 1 writes to walog
> # thread 1 locks _closeLock_
> # thread 1 adds sync work to workQueue
> # thread 1 unlocks _closeLock_
> # thread B takes sync work off of workQueue
> # thread B locks _closeLock_
> # thread B calls sync
> # thread 3 enters DfsLogger.logFileData()
> # thread 3 writes to walog
> # thread 3 blocks locking _closeLock_
> # thread 4 enters DfsLogger.logFileData()
> # thread 4 writes to walog
> # thread 4 blocks locking _closeLock_
> # thread B unlocks _closeLock_
> # thread 4 locks _closeLock_
> # thread 4 adds sync work to workQueue
> # thread B takes sync work off of workQueue
> # thread B blocks locking _closeLock_
> # thread 4 unlocks _closeLock_
> # thread B locks _closeLock_
> # thread B calls sync
> # thread B unlocks _closeLock_
> # thread 3 locks _closeLock_
> # thread 3 adds sync work to workQueue
> # thread 3 unlocks _closeLock_
> In this situation thread 3 unnecessarily has to wait for an extra {{hsync}}
> call. Not sure if this situation actually occurs, or if it occurs very
> frequently. Looking at the code it seems like it would be nice if sync
> operations could be queued w/o synchronizing w/ sync operations.
--
This message was sent by Atlassian JIRA
(v6.2#6252)