Hi Stack,

Thanks for that pointer, I am not aware of sketches(one more concept to
learn :)). I will explore and see if this helps.

Hi Andrew,

Yes, this is needed for a Phoenix table but there are two asks. one is from
customer side who wants to know the size of their actual rows which is
equal to the sum of the size of all columns latest version(there might
extra versions or delete markers which might not be something customer
interested since they don't read that data) and second ask is from service
owner point of view where we want to know the size of full row including
all cells, this is needed for internal operations like backups, migrations,
growth analysis, stats.  If we have something at HBase level then coming up
with a similar one for Phoenix table seems to be not that of a big job(I
might be wrong).


Thanks
Sukumar



On Thu, May 14, 2020 at 10:11 AM Andrew Purtell <[email protected]> wrote:

> > I keep thinking about inlining this stuff at flush/compaction time and
> appending the sketch to an hfile. After the fact you could read the
> sketches in the tail of the hfiles for some counts on a Region basis but it
> wouldn't be row-based.
>
> There should be an issue for this if not one already (I've heard it
> mentioned before). It would be a very nice to have. Wasn't the sketch stuff
> from Yahoo incubated? ... Yes: https://datasketches.apache.org/ ,
> https://incubator.apache.org/clutch/datasketches.html . There's something
> in the family to try, so to speak.
>
> The row vs cell distinction is an important one. If you are looking to add
> or use something provided by HBase, the view of the data will be cell
> based. That might be what you need, it might not be. Table level statistics
> (aggregated from region sketches as stack suggests) would roll up either
> cells or rows so could work if that's the granularity you need.
>
> If the ask is for row based statistics for Phoenix, this is a question
> better asked on dev@phoenix.
>
>
> On Thu, May 14, 2020 at 9:19 AM Stack <[email protected]> wrote:
>
> > On Wed, May 13, 2020 at 10:38 PM Sukumar Maddineni
> > <[email protected]> wrote:
> >
> > > Hello everyone,
> > >
> > > Is there any existing tool which we can use to understand the size of
> the
> > > rows in a table.  Like we want to know what is p90, max row size of
> rows
> > in
> > > a given table to understand the usage pattern and see how much room we
> > have
> > > before having large rows.
> > >
> > > I was thinking similar to RowCounter with reducer to consolidate the
> > info.
> > >
> > >
> > I've had some success scanning rows on a per-Region basis dumping a
> report
> > per Region. I was passing the per row Results via something like the
> below:
> >
> >    static void processRowResult(Result result, Sketches sketches) {
> >      // System.out.println(result.toString());
> >      long rowSize = 0;
> >      int columnCount = 0;
> >      for (Cell cell : result.rawCells()) {
> >        rowSize += estimatedSizeOfCell(cell);
> >        columnCount += 1;
> >      }
> >      sketches.rowSizeSketch.update(rowSize);
> >      sketches.columnCountSketch.update(columnCount);
> >    }
> >
> > ... where the sketches are variants of
> > com.yahoo.sketches.quantiles.*Sketch. The latter are nice in that the
> > sketches can be aggregated so you can after-the-fact make table sketches
> by
> > summing all of the Region sketches. I had a 100 quantiles so could do 95%
> > or 96%, etc. The bins to use for say data size take a bit of tuning but
> can
> > make a decent guess for first go round and see how you do.
> >
> > I keep thinking about inlining this stuff at flush/compaction time and
> > appending the sketch to an hfile. After the fact you could read the
> > sketches in the tail of the hfiles for some counts on a Region basis but
> it
> > wouldn't be row-based. For row-based, you'd have to read Rows (hfiles are
> > buckets of Cells, not rows).
> >
> > S
> >
> >
> >
> > >
> > > --
> > > Sukumar
> > >
> > > <https://smart.salesforce.com/sig/smaddineni//us_mb/default/link.html>
> > >
> >
>
>
> --
> Best regards,
> Andrew
>
> Words like orphans lost among the crosstalk, meaning torn from truth's
> decrepit hands
>    - A23, Crosstalk
>


-- 

<https://smart.salesforce.com/sig/smaddineni//us_mb/default/link.html>

Reply via email to