On Wed, May 13, 2020 at 10:38 PM Sukumar Maddineni
<[email protected]> wrote:

> Hello everyone,
>
> Is there any existing tool which we can use to understand the size of the
> rows in a table.  Like we want to know what is p90, max row size of rows in
> a given table to understand the usage pattern and see how much room we have
> before having large rows.
>
> I was thinking similar to RowCounter with reducer to consolidate the info.
>
>
I've had some success scanning rows on a per-Region basis dumping a report
per Region. I was passing the per row Results via something like the below:

   static void processRowResult(Result result, Sketches sketches) {
     // System.out.println(result.toString());
     long rowSize = 0;
     int columnCount = 0;
     for (Cell cell : result.rawCells()) {
       rowSize += estimatedSizeOfCell(cell);
       columnCount += 1;
     }
     sketches.rowSizeSketch.update(rowSize);
     sketches.columnCountSketch.update(columnCount);
   }

... where the sketches are variants of
com.yahoo.sketches.quantiles.*Sketch. The latter are nice in that the
sketches can be aggregated so you can after-the-fact make table sketches by
summing all of the Region sketches. I had a 100 quantiles so could do 95%
or 96%, etc. The bins to use for say data size take a bit of tuning but can
make a decent guess for first go round and see how you do.

I keep thinking about inlining this stuff at flush/compaction time and
appending the sketch to an hfile. After the fact you could read the
sketches in the tail of the hfiles for some counts on a Region basis but it
wouldn't be row-based. For row-based, you'd have to read Rows (hfiles are
buckets of Cells, not rows).

S



>
> --
> Sukumar
>
> <https://smart.salesforce.com/sig/smaddineni//us_mb/default/link.html>
>

Reply via email to