It's not recommended to read the Metadata table? When I needed the 'real' number, I ran a compaction. When I needed an estimate I just read the table. I also upgraded our ingest process to track numbers as a second phase to avoid the need for compaction to get 'real' numbers.
On Mon, Nov 9, 2015 at 10:52 AM, Josh Elser <[email protected]> wrote: > Note that CountingIterator is in the system iterator package > (FirstEntryInRowIterator also isn't in the user package for iterators, so > its stability is a little questionable too). I think David ran into this a > long time ago as well. > > Stable versions of both of these would be good, IMO. It isn't like Z is > the first one to ask how to count the unique rows :) > > > William Slacum wrote: > >> Pranked... you can't use a CountingIterator, because it can't be init'd. >> Can we get rid of that limitation? >> >> On Mon, Nov 9, 2015 at 10:43 AM, William Slacum<[email protected]> >> wrote: >> >> An interator stack of FirstEntryInRowIterator + CountingIterator will >>> return the count of rows in each tablet, which can then be combined on >>> the >>> client side. >>> >>> On Mon, Nov 9, 2015 at 10:25 AM, Josh Elser<[email protected]> >>> wrote: >>> >>> Yeah, there's no explicit tracking of all rows in Accumulo, you're stuck >>>> with enumerating them (or explicitly tracking them yourself at ingest >>>> time). >>>> >>>> The easiest approach you can take is probably using the >>>> FirstEntryInRowIterator and counting each row on the client-side. >>>> >>>> You could do another summation in a second iterator but this is a little >>>> tricky to get correct. I tried to touch on this a little in a blog >>>> post[1]. >>>> If this is a one-off question you want to answer, doing the summation on >>>> the client side is likely not to take excessively longer than a >>>> server-side >>>> summation. >>>> >>>> [1] >>>> >>>> https://blogs.apache.org/accumulo/entry/thinking_about_reads_over_accumulo >>>> >>>> >>>> z11373 wrote: >>>> >>>> I want to get total rows of a table (likely has more than 100M rows), I >>>>> think >>>>> to get that information, Accumulo would have to iterate all rows :-( >>>>> This >>>>> may not be typical Accumulo scenario. >>>>> >>>>> Is there a more efficient way to get total number of rows in a table? >>>>> When Accumulo iterating those items, does it mean it will pull the data >>>>> to >>>>> the client? If yes, is there a way to ask it to return just the number, >>>>> since that's the only data I care. >>>>> >>>>> Thanks, >>>>> Z >>>>> >>>>> >>>>> >>>>> -- >>>>> View this message in context: >>>>> >>>>> http://apache-accumulo.1065345.n5.nabble.com/total-table-rows-tp15484.html >>>>> Sent from the Developers mailing list archive at Nabble.com. >>>>> >>>>> >>
