[GitHub] [accumulo] snoami opened issue #624: Performance with midsize table size

GitHub Wed, 29 Aug 2018 23:49:49 -0700

I have a table with about 4M rows in Accumulo, which is grouped to about 90K 
entities with different rowIds, and it takes about 10 sec to scan all the table 
from the Java API when I scan with large rowId range or ranges. I have also 
tried BatchScan ranges where each rowId is its own range. Scan -np on the 
Accumulo shell gives similar result.
I have tried it with different configurations on very strong machines @AWS and 
the performance does not get better than about 10s. Is that reasonable? It 
seems very slow but maybe that is what Accumulo can do. Does anyone have 
experience with such queries? Using: Accumulo 1.7.4 Hadoop 2.8.1 4vCPU, 16GB 
RAM (AWS m5d.xlarge), 5 servers running Accumulo tservers and Hadoop data 
nodes, one name node running Accumulo master.



[ Full content available at: https://github.com/apache/accumulo/issues/624 ]
This message was relayed via gitbox.apache.org for [email protected]

[GitHub] [accumulo] snoami opened issue #624: Performance with midsize table size

Reply via email to