[
https://issues.apache.org/jira/browse/ACCUMULO-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13813222#comment-13813222
]
Keith Turner commented on ACCUMULO-1833:
----------------------------------------
bq. Another option would be to add a call to get a batch writer with just a
table ID.
This is still kinda cumbersome. If the user calls
{{getBatchWriterByTableId(tableOperations.tableIdMap().get(tableName))}}, then
we are back where we started. The javadoc for this method would need to
instruct the user to only call tableOperations.tableIdMap() once and keep the
map.
bq. However, I do feel if we're going to encourage users to cache the BWs
themselves, then we should remove the internal caching in the MTBW code
Taking a closer look at the code, the cache is not really doing much. Whats
cached is a TableBatchWriter object which is a very thin layer around a
TabletServerBatchWriter. MultitableBatchWriterImpl only has one
TabletServerBatchWriter. All the cache is really doing is avoiding object
creation, but there is a ton of object creation in {{getBatchWriter()}} before
the cache.
> MultiTableBatchWriterImpl.getBatchWriter() is not performant for multiple
> threads
> ---------------------------------------------------------------------------------
>
> Key: ACCUMULO-1833
> URL: https://issues.apache.org/jira/browse/ACCUMULO-1833
> Project: Accumulo
> Issue Type: Improvement
> Affects Versions: 1.5.0, 1.6.0
> Reporter: Chris McCubbin
> Attachments: ACCUMULO-1833-test.patch, ZooKeeperThreadUtilization.png
>
>
> This issue comes from profiling our application. We have a
> MultiTableBatchWriter created by normal means. I am attempting to write to it
> with multiple threads by doing things like the following:
> {code}
> batchWriter.getBatchWriter(table).addMutations(mutations);
> {code}
> In my test with 4 threads writing to one table, this call is quite
> inefficient and results in a large performance degradation over a single
> BatchWriter.
> I believe the culprit is the fact that the call is synchronized. Also there
> is the possibility that the zookeeper call to Tables.getTableState on every
> call is negatively affecting performance:
> {code}
> @Override
> public synchronized BatchWriter getBatchWriter(String tableName) throws
> AccumuloException, AccumuloSecurityException, TableNotFoundException {
> ArgumentChecker.notNull(tableName);
> String tableId = Tables.getNameToIdMap(instance).get(tableName);
> if (tableId == null)
> throw new TableNotFoundException(tableId, tableName, null);
>
> if (Tables.getTableState(instance, tableId) == TableState.OFFLINE)
> throw new TableOfflineException(instance, tableId);
>
> BatchWriter tbw = tableWriters.get(tableId);
> if (tbw == null) {
> tbw = new TableBatchWriter(tableId);
> tableWriters.put(tableId, tbw);
> }
> return tbw;
> }
> {code}
> I recommend moving the synchronized block to happen only if the batchwriter
> is not present, and also only checking if the table is online at that time:
> {code}
> @Override
> public BatchWriter getBatchWriter(String tableName) throws
> AccumuloException, AccumuloSecurityException, TableNotFoundException {
> ArgumentChecker.notNull(tableName);
> String tableId = Tables.getNameToIdMap(instance).get(tableName);
> if (tableId == null)
> throw new TableNotFoundException(tableId, tableName, null);
> BatchWriter tbw = tableWriters.get(tableId);
> if (tbw == null) {
> if (Tables.getTableState(instance, tableId) == TableState.OFFLINE)
> throw new TableOfflineException(instance, tableId);
> tbw = new TableBatchWriter(tableId);
> synchronized(tableWriters){
> //only create a new table writer if we haven't been beaten to it.
> if (tableWriters.get(tableId) == null)
> tableWriters.put(tableId, tbw);
> }
> }
> return tbw;
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.1#6144)