On Wed, May 13, 2020 at 10:38 PM Sukumar Maddineni
<[email protected]> wrote:
> Hello everyone,
>
> Is there any existing tool which we can use to understand the size of the
> rows in a table. Like we want to know what is p90, max row size of rows in
> a given table to understand the usage pattern and see how much room we have
> before having large rows.
>
> I was thinking similar to RowCounter with reducer to consolidate the info.
>
>
I've had some success scanning rows on a per-Region basis dumping a report
per Region. I was passing the per row Results via something like the below:
static void processRowResult(Result result, Sketches sketches) {
// System.out.println(result.toString());
long rowSize = 0;
int columnCount = 0;
for (Cell cell : result.rawCells()) {
rowSize += estimatedSizeOfCell(cell);
columnCount += 1;
}
sketches.rowSizeSketch.update(rowSize);
sketches.columnCountSketch.update(columnCount);
}
... where the sketches are variants of
com.yahoo.sketches.quantiles.*Sketch. The latter are nice in that the
sketches can be aggregated so you can after-the-fact make table sketches by
summing all of the Region sketches. I had a 100 quantiles so could do 95%
or 96%, etc. The bins to use for say data size take a bit of tuning but can
make a decent guess for first go round and see how you do.
I keep thinking about inlining this stuff at flush/compaction time and
appending the sketch to an hfile. After the fact you could read the
sketches in the tail of the hfiles for some counts on a Region basis but it
wouldn't be row-based. For row-based, you'd have to read Rows (hfiles are
buckets of Cells, not rows).
S
>
> --
> Sukumar
>
> <https://smart.salesforce.com/sig/smaddineni//us_mb/default/link.html>
>