[
https://issues.apache.org/jira/browse/ACCUMULO-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13813997#comment-13813997
]
Keith Turner commented on ACCUMULO-1833:
----------------------------------------
bq. can't we make a time-sensitive cache (i.e. hold on to things for 5
seconds) that maintains the same semantics and still fixes this performance
issue?
That sounds good. Would need to ensure time going backwards is handled
properly (just drop the cache entry). A "small" timeout could be determined
empirically. If this approach looks promising, this should be marked a bug
(it fixes a performance problem and does not change behavior or API) and fixed
in earlier releases.
> MultiTableBatchWriterImpl.getBatchWriter() is not performant for multiple
> threads
> ---------------------------------------------------------------------------------
>
> Key: ACCUMULO-1833
> URL: https://issues.apache.org/jira/browse/ACCUMULO-1833
> Project: Accumulo
> Issue Type: Improvement
> Affects Versions: 1.5.0, 1.6.0
> Reporter: Chris McCubbin
> Attachments: ACCUMULO-1833-test.patch, ZooKeeperThreadUtilization.png
>
>
> This issue comes from profiling our application. We have a
> MultiTableBatchWriter created by normal means. I am attempting to write to it
> with multiple threads by doing things like the following:
> {code}
> batchWriter.getBatchWriter(table).addMutations(mutations);
> {code}
> In my test with 4 threads writing to one table, this call is quite
> inefficient and results in a large performance degradation over a single
> BatchWriter.
> I believe the culprit is the fact that the call is synchronized. Also there
> is the possibility that the zookeeper call to Tables.getTableState on every
> call is negatively affecting performance:
> {code}
> @Override
> public synchronized BatchWriter getBatchWriter(String tableName) throws
> AccumuloException, AccumuloSecurityException, TableNotFoundException {
> ArgumentChecker.notNull(tableName);
> String tableId = Tables.getNameToIdMap(instance).get(tableName);
> if (tableId == null)
> throw new TableNotFoundException(tableId, tableName, null);
>
> if (Tables.getTableState(instance, tableId) == TableState.OFFLINE)
> throw new TableOfflineException(instance, tableId);
>
> BatchWriter tbw = tableWriters.get(tableId);
> if (tbw == null) {
> tbw = new TableBatchWriter(tableId);
> tableWriters.put(tableId, tbw);
> }
> return tbw;
> }
> {code}
> I recommend moving the synchronized block to happen only if the batchwriter
> is not present, and also only checking if the table is online at that time:
> {code}
> @Override
> public BatchWriter getBatchWriter(String tableName) throws
> AccumuloException, AccumuloSecurityException, TableNotFoundException {
> ArgumentChecker.notNull(tableName);
> String tableId = Tables.getNameToIdMap(instance).get(tableName);
> if (tableId == null)
> throw new TableNotFoundException(tableId, tableName, null);
> BatchWriter tbw = tableWriters.get(tableId);
> if (tbw == null) {
> if (Tables.getTableState(instance, tableId) == TableState.OFFLINE)
> throw new TableOfflineException(instance, tableId);
> tbw = new TableBatchWriter(tableId);
> synchronized(tableWriters){
> //only create a new table writer if we haven't been beaten to it.
> if (tableWriters.get(tableId) == null)
> tableWriters.put(tableId, tbw);
> }
> }
> return tbw;
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.1#6144)