The GUIDE_POSTS_WIDTH and GUIDE_POSTS_ROW_COUNT should contain the number
of bytes and number of rows which were traversed since the last guidepost.
So given some start key and stop key from a scan and knowledge that a given
column family is used in a query, you should be able to run a query like
this:

SELECT SUM(GUIDE_POSTS_WIDTH) bytes_traversed,
    SUM(GUIDE_POSTS_ROW_COUNT) rows_traversed
FROM SYSTEM.STATS
WHERE COLUMN_FAMILY = :1
AND GUIDE_POST_KEY >= :2
AND GUIDE_POST_KEY < :3

where :1 is the column family, :2 is the start row of the scan, and :3 is
the stop row of the scan. The result of the query should tell you the
bytes_traversed and the rows_traversed with a granularity of the
phoenix.stats.guidepost.width config parameter.

We could even run this across all column families being traversed based on
the which ones are referenced and projected into the scan. Or we could use
the "empty column family" (using SchemaUtil.getEmptyColumnFamily() as
Maryann mentioned) which is the one that is typically projected. FWIW, the
logic of which guideposts are used by a query is here:
BaseResultIterators.getGuidePosts().

Make sense? Is that the way it's working? If not, let's file a JIRA please.

Thanks,
James

On Sat, Feb 13, 2016 at 10:15 AM, Maryann Xue <[email protected]> wrote:

> Thank you, Ankit! I see what you mean. But I think what I queried was the
> default CF. SchemaUtil.getEmptyColumnFamily(), is that correct? I'll try to
> see if I can reproduce this.
>
> On Sat, Feb 13, 2016 at 8:07 AM, Ankit Singhal <[email protected]>
> wrote:
>
> > Yes James, Query is using guidePosts as per the cf used in filter.
> > But I think Maryann is expecting that rowcount and bytescount should be
> > available at each guidePost key level, which we currently don't store.
> > currently, we can use metrics(like rowcount/bytecount) at cf level only
> > right?
> >
> > On Sat, Feb 13, 2016 at 11:34 AM, James Taylor <[email protected]>
> > wrote:
> >
> > > We should have separate guideposts per cf, as the data distribution may
> > be
> > > different. We use the default cf if it's being filtered on, but
> otherwise
> > > use a different cf.
> > >
> > > Is that how it works currently, Ankit?
> > >
> > > On Friday, February 12, 2016, Ankit Singhal <[email protected]>
> > > wrote:
> > >
> > > > but I think we need these metrics at cf only right as per this
> comment-
> > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/browse/PHOENIX-2143?focusedCommentId=15069779&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15069779
> > > >
> > > >
> > > > that's why we serialize aggregated value of region at cf level in
> first
> > > > guide post only.
> > > >
> > > > Regards,
> > > > Ankit Singhal
> > > >
> > > > On Sat, Feb 13, 2016 at 9:07 AM, Maryann Xue <[email protected]
> > > > <javascript:;>> wrote:
> > > >
> > > > > Thanks a lot for the answer, James! The data size has well exceeded
> > the
> > > > > guidepost width and the guideposts do exist but without
> corresponding
> > > > > "rowCount" or "byteCount" cell. I'll try doing a Phoenix query
> > instead
> > > > and
> > > > > confirm that it is a bug.
> > > > >
> > > > >
> > > > > Thanks,
> > > > > Maryann
> > > > >
> > > > > On Fri, Feb 12, 2016 at 10:21 PM, James Taylor <
> > [email protected]
> > > > <javascript:;>>
> > > > > wrote:
> > > > >
> > > > > > Hi Maryann,
> > > > > > If the amount of data in a region is less than the guidepost
> width,
> > > > then
> > > > > > it's possible you'd get no guideposts for that region. Do you
> think
> > > > > that's
> > > > > > the case? If not, it sound like there may be a bug.
> > > > > >
> > > > > > Assuming you're querying to get the stats information, I'd
> > recommend
> > > > > doing
> > > > > > a Phoenix query directly. The code you're emulating uses straight
> > > HBase
> > > > > > APIs because it's called from the server-side. It'd be a one
> liner
> > > as a
> > > > > > Phoenix query.
> > > > > >
> > > > > > Thanks,
> > > > > > James
> > > > > >
> > > > > > On Fri, Feb 12, 2016 at 11:23 AM, Maryann Xue <
> > [email protected]
> > > > <javascript:;>>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > This was something I noticed when applying Phoenix table stats
> > into
> > > > > > > Calcite-Phoenix cost calculation: When executing the following
> > code
> > > > (a
> > > > > > > slightly modified version of the existing StatisticsUtil
> method)
> > to
> > > > > scan
> > > > > > > stats table for a specific column-family and a specific
> > start/stop
> > > > key
> > > > > > > range, I got guidepost rows that did not contain the rowCount
> or
> > > > > > byteCount
> > > > > > > cell, for all rows in the specified range. Apparently, I had
> set
> > > the
> > > > > > > corresponding columns in the Scan (as shown below). Meanwhile,
> > > > another
> > > > > > > range of stats in the same table gave me the right result. I am
> > > > > wondering
> > > > > > > if this is an expected behavior or it is a bug?
> > > > > > >
> > > > > > >     public static PTableStats readStatistics(HTableInterface
> > > > > statsHTable,
> > > > > > >
> > > > > > >             byte[] tableNameBytes, ImmutableBytesPtr cf, byte[]
> > > > > startKey,
> > > > > > > byte[] stopKey,
> > > > > > >
> > > > > > >             long clientTimeStamp)
> > > > > > >
> > > > > > >             throws IOException {
> > > > > > >
> > > > > > >         ImmutableBytesWritable ptr = new
> > ImmutableBytesWritable();
> > > > > > >
> > > > > > >         Scan s;
> > > > > > >
> > > > > > >         if (cf == null) {
> > > > > > >
> > > > > > >             s = MetaDataUtil.newTableRowsScan(tableNameBytes,
> > > > > > > MetaDataProtocol.MIN_TABLE_TIMESTAMP, clientTimeStamp);
> > > > > > >
> > > > > > >         } else {
> > > > > > >
> > > > > > >             s =
> > > > MetaDataUtil.newTableRowsScan(getAdjustedKey(startKey,
> > > > > > > tableNameBytes, cf, false),
> > > > > > >
> > > > > > >                     getAdjustedKey(stopKey, tableNameBytes, cf,
> > > > true),
> > > > > > > MetaDataProtocol.MIN_TABLE_TIMESTAMP,
> > > > > > >
> > > > > > >                     clientTimeStamp);
> > > > > > >
> > > > > > >         }
> > > > > > >
> > > > > > >         s.addColumn(QueryConstants.DEFAULT_COLUMN_FAMILY_BYTES,
> > > > > > > PhoenixDatabaseMetaData.GUIDE_POSTS_WIDTH_BYTES);
> > > > > > >
> > > > > > >         s.addColumn(QueryConstants.DEFAULT_COLUMN_FAMILY_BYTES,
> > > > > > > PhoenixDatabaseMetaData.GUIDE_POSTS_ROW_COUNT_BYTES);
> > > > > > >
> > > > > > >         s.addColumn(QueryConstants.DEFAULT_COLUMN_FAMILY_BYTES,
> > > > > > > QueryConstants.EMPTY_COLUMN_BYTES);
> > > > > > >
> > > > > > >         ResultScanner scanner = null;
> > > > > > >
> > > > > > >         long timeStamp = MetaDataProtocol.MIN_TABLE_TIMESTAMP;
> > > > > > >
> > > > > > >         TreeMap<byte[], GuidePostsInfoBuilder>
> > > > > guidePostsInfoWriterPerCf
> > > > > > =
> > > > > > > new TreeMap<byte[],
> > GuidePostsInfoBuilder>(Bytes.BYTES_COMPARATOR);
> > > > > > >
> > > > > > >         try {
> > > > > > >
> > > > > > >             scanner = statsHTable.getScanner(s);
> > > > > > >
> > > > > > >             Result result = null;
> > > > > > >
> > > > > > >             while ((result = scanner.next()) != null) {
> > > > > > >
> > > > > > >                 CellScanner cellScanner = result.cellScanner();
> > > > > > >
> > > > > > >                 long rowCount = 0;
> > > > > > >
> > > > > > >                 long byteCount = 0;
> > > > > > >
> > > > > > >                 byte[] cfName = null;
> > > > > > >
> > > > > > >                 int tableNameLength;
> > > > > > >
> > > > > > >                 int cfOffset;
> > > > > > >
> > > > > > >                 int cfLength;
> > > > > > >
> > > > > > >                 boolean valuesSet = false;
> > > > > > >
> > > > > > >                 // Only the two cells with quals
> > > > > > > GUIDE_POSTS_ROW_COUNT_BYTES and GUIDE_POSTS_BYTES would be
> > > retrieved
> > > > > > >
> > > > > > >                 while (cellScanner.advance()) {
> > > > > > >
> > > > > > >                     Cell current = cellScanner.current();
> > > > > > >
> > > > > > >                     if (!valuesSet) {
> > > > > > >
> > > > > > >                         tableNameLength =
> tableNameBytes.length +
> > > 1;
> > > > > > >
> > > > > > >                         cfOffset = current.getRowOffset() +
> > > > > > > tableNameLength;
> > > > > > >
> > > > > > >                         cfLength =
> > > > > > getVarCharLength(current.getRowArray(),
> > > > > > > cfOffset,
> > > > > > >
> > > > > > >                                 current.getRowLength() -
> > > > > > tableNameLength);
> > > > > > >
> > > > > > >                         ptr.set(current.getRowArray(),
> cfOffset,
> > > > > > cfLength);
> > > > > > >
> > > > > > >                         valuesSet = true;
> > > > > > >
> > > > > > >                     }
> > > > > > >
> > > > > > >                     cfName =
> > ByteUtil.copyKeyBytesIfNecessary(ptr);
> > > > > > >
> > > > > > >                     if
> (Bytes.equals(current.getQualifierArray(),
> > > > > current
> > > > > > > .getQualifierOffset(),
> > > > > > >
> > > > > > >                             current.getQualifierLength(),
> > > > > > > PhoenixDatabaseMetaData.GUIDE_POSTS_ROW_COUNT_BYTES, 0,
> > > > > > >
> > > > > > >                             PhoenixDatabaseMetaData.
> > > > > > > GUIDE_POSTS_ROW_COUNT_BYTES.length)) {
> > > > > > >
> > > > > > >                         rowCount =
> > > > > PLong.INSTANCE.getCodec().decodeLong(
> > > > > > > current.getValueArray(),
> > > > > > >
> > > > > > >                                 current.getValueOffset(),
> > > > > > > SortOrder.getDefault());
> > > > > > >
> > > > > > >                     } else if
> > > > > (Bytes.equals(current.getQualifierArray(),
> > > > > > > current.getQualifierOffset(),
> > > > > > >
> > > > > > >                             current.getQualifierLength(),
> > > > > > > PhoenixDatabaseMetaData.GUIDE_POSTS_WIDTH_BYTES, 0,
> > > > > > >
> > > > > > >
> > > > > > > PhoenixDatabaseMetaData.GUIDE_POSTS_WIDTH_BYTES.
> > > > > > > length)) {
> > > > > > >
> > > > > > >                         byteCount =
> > > > > PLong.INSTANCE.getCodec().decodeLong(
> > > > > > > current.getValueArray(),
> > > > > > >
> > > > > > >                                 current.getValueOffset(),
> > > > > > > SortOrder.getDefault());
> > > > > > >
> > > > > > >                     }
> > > > > > >
> > > > > > >                     if (current.getTimestamp() > timeStamp) {
> > > > > > >
> > > > > > >                         timeStamp = current.getTimestamp();
> > > > > > >
> > > > > > >                     }
> > > > > > >
> > > > > > >                 }
> > > > > > >
> > > > > > >                 if (cfName != null) {
> > > > > > >
> > > > > > >                     byte[] newGPStartKey =
> > > > getGuidePostsInfoFromRowKey(
> > > > > > > tableNameBytes, cfName, result.getRow());
> > > > > > >
> > > > > > >                     GuidePostsInfoBuilder guidePostsInfoWriter
> =
> > > > > > > guidePostsInfoWriterPerCf.get(cfName);
> > > > > > >
> > > > > > >                     if (guidePostsInfoWriter == null) {
> > > > > > >
> > > > > > >                         guidePostsInfoWriter = new
> > > > > > GuidePostsInfoBuilder();
> > > > > > >
> > > > > > >                         guidePostsInfoWriterPerCf.put(cfName,
> > > > > > > guidePostsInfoWriter);
> > > > > > >
> > > > > > >                     }
> > > > > > >
> > > > > > >
> > >  guidePostsInfoWriter.addGuidePosts(newGPStartKey,
> > > > > > > byteCount, rowCount);
> > > > > > >
> > > > > > >                 }
> > > > > > >
> > > > > > >             }
> > > > > > >
> > > > > > >             if (!guidePostsInfoWriterPerCf.isEmpty()) { return
> > new
> > > > > > > PTableStatsImpl(
> > > > > > >
> > > > > > >
> >  getGuidePostsPerCf(guidePostsInfoWriterPerCf),
> > > > > > > timeStamp);
> > > > > > > }
> > > > > > >
> > > > > > >         } finally {
> > > > > > >
> > > > > > >             if (scanner != null) {
> > > > > > >
> > > > > > >                 scanner.close();
> > > > > > >
> > > > > > >             }
> > > > > > >
> > > > > > >         }
> > > > > > >
> > > > > > >         return PTableStats.EMPTY_STATS;
> > > > > > >     }
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to