I have a table with about 4M rows in Accumulo, which is grouped to about 90K entities with different rowIds, and it takes about 10 sec to scan all the table from the Java API when I scan with large rowId range or ranges. I have also tried BatchScan ranges where each rowId is its own range. Scan -np on the Accumulo shell gives similar result. I have tried it with different configurations on very strong machines @AWS and the performance does not get better than about 10s. Is that reasonable? It seems very slow but maybe that is what Accumulo can do. Does anyone have experience with such queries? Using: Accumulo 1.7.4 Hadoop 2.8.1 4vCPU, 16GB RAM (AWS m5d.xlarge), 5 servers running Accumulo tservers and Hadoop data nodes, one name node running Accumulo master.
[ Full content available at: https://github.com/apache/accumulo/issues/624 ] This message was relayed via gitbox.apache.org for [email protected]
