The question was to compute the number of rows, not the number of
entries. The metadata table does not track the number of rows.
David Medinets wrote:
It's not recommended to read the Metadata table? When I needed the 'real'
number, I ran a compaction. When I needed an estimate I just read the
table. I also upgraded our ingest process to track numbers as a second
phase to avoid the need for compaction to get 'real' numbers.
On Mon, Nov 9, 2015 at 10:52 AM, Josh Elser<[email protected]> wrote:
Note that CountingIterator is in the system iterator package
(FirstEntryInRowIterator also isn't in the user package for iterators, so
its stability is a little questionable too). I think David ran into this a
long time ago as well.
Stable versions of both of these would be good, IMO. It isn't like Z is
the first one to ask how to count the unique rows :)
William Slacum wrote:
Pranked... you can't use a CountingIterator, because it can't be init'd.
Can we get rid of that limitation?
On Mon, Nov 9, 2015 at 10:43 AM, William Slacum<[email protected]>
wrote:
An interator stack of FirstEntryInRowIterator + CountingIterator will
return the count of rows in each tablet, which can then be combined on
the
client side.
On Mon, Nov 9, 2015 at 10:25 AM, Josh Elser<[email protected]>
wrote:
Yeah, there's no explicit tracking of all rows in Accumulo, you're stuck
with enumerating them (or explicitly tracking them yourself at ingest
time).
The easiest approach you can take is probably using the
FirstEntryInRowIterator and counting each row on the client-side.
You could do another summation in a second iterator but this is a little
tricky to get correct. I tried to touch on this a little in a blog
post[1].
If this is a one-off question you want to answer, doing the summation on
the client side is likely not to take excessively longer than a
server-side
summation.
[1]
https://blogs.apache.org/accumulo/entry/thinking_about_reads_over_accumulo
z11373 wrote:
I want to get total rows of a table (likely has more than 100M rows), I
think
to get that information, Accumulo would have to iterate all rows :-(
This
may not be typical Accumulo scenario.
Is there a more efficient way to get total number of rows in a table?
When Accumulo iterating those items, does it mean it will pull the data
to
the client? If yes, is there a way to ask it to return just the number,
since that's the only data I care.
Thanks,
Z
--
View this message in context:
http://apache-accumulo.1065345.n5.nabble.com/total-table-rows-tp15484.html
Sent from the Developers mailing list archive at Nabble.com.