Shutting up now. :) On Mon, Nov 9, 2015 at 11:06 AM, Josh Elser <[email protected]> wrote:
> The question was to compute the number of rows, not the number of entries. > The metadata table does not track the number of rows. > > David Medinets wrote: > >> It's not recommended to read the Metadata table? When I needed the 'real' >> number, I ran a compaction. When I needed an estimate I just read the >> table. I also upgraded our ingest process to track numbers as a second >> phase to avoid the need for compaction to get 'real' numbers. >> >> On Mon, Nov 9, 2015 at 10:52 AM, Josh Elser<[email protected]> wrote: >> >> Note that CountingIterator is in the system iterator package >>> (FirstEntryInRowIterator also isn't in the user package for iterators, so >>> its stability is a little questionable too). I think David ran into this >>> a >>> long time ago as well. >>> >>> Stable versions of both of these would be good, IMO. It isn't like Z is >>> the first one to ask how to count the unique rows :) >>> >>> >>> William Slacum wrote: >>> >>> Pranked... you can't use a CountingIterator, because it can't be init'd. >>>> Can we get rid of that limitation? >>>> >>>> On Mon, Nov 9, 2015 at 10:43 AM, William Slacum<[email protected]> >>>> wrote: >>>> >>>> An interator stack of FirstEntryInRowIterator + CountingIterator will >>>> >>>>> return the count of rows in each tablet, which can then be combined on >>>>> the >>>>> client side. >>>>> >>>>> On Mon, Nov 9, 2015 at 10:25 AM, Josh Elser<[email protected]> >>>>> wrote: >>>>> >>>>> Yeah, there's no explicit tracking of all rows in Accumulo, you're >>>>> stuck >>>>> >>>>>> with enumerating them (or explicitly tracking them yourself at ingest >>>>>> time). >>>>>> >>>>>> The easiest approach you can take is probably using the >>>>>> FirstEntryInRowIterator and counting each row on the client-side. >>>>>> >>>>>> You could do another summation in a second iterator but this is a >>>>>> little >>>>>> tricky to get correct. I tried to touch on this a little in a blog >>>>>> post[1]. >>>>>> If this is a one-off question you want to answer, doing the summation >>>>>> on >>>>>> the client side is likely not to take excessively longer than a >>>>>> server-side >>>>>> summation. >>>>>> >>>>>> [1] >>>>>> >>>>>> >>>>>> https://blogs.apache.org/accumulo/entry/thinking_about_reads_over_accumulo >>>>>> >>>>>> >>>>>> z11373 wrote: >>>>>> >>>>>> I want to get total rows of a table (likely has more than 100M rows), >>>>>> I >>>>>> >>>>>>> think >>>>>>> to get that information, Accumulo would have to iterate all rows :-( >>>>>>> This >>>>>>> may not be typical Accumulo scenario. >>>>>>> >>>>>>> Is there a more efficient way to get total number of rows in a table? >>>>>>> When Accumulo iterating those items, does it mean it will pull the >>>>>>> data >>>>>>> to >>>>>>> the client? If yes, is there a way to ask it to return just the >>>>>>> number, >>>>>>> since that's the only data I care. >>>>>>> >>>>>>> Thanks, >>>>>>> Z >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> View this message in context: >>>>>>> >>>>>>> >>>>>>> http://apache-accumulo.1065345.n5.nabble.com/total-table-rows-tp15484.html >>>>>>> Sent from the Developers mailing list archive at Nabble.com. >>>>>>> >>>>>>> >>>>>>> >>
