I think then we need to store rowCount and byteCount at guidePost level. so
I have created a Jira(PHOENIX-2683) and uploaded a patch for the same.

On Sat, Feb 13, 2016 at 11:50 PM, James Taylor <[email protected]>
wrote:

> The GUIDE_POSTS_WIDTH and GUIDE_POSTS_ROW_COUNT should contain the number
> of bytes and number of rows which were traversed since the last guidepost.
> So given some start key and stop key from a scan and knowledge that a given
> column family is used in a query, you should be able to run a query like
> this:
>
> SELECT SUM(GUIDE_POSTS_WIDTH) bytes_traversed,
>     SUM(GUIDE_POSTS_ROW_COUNT) rows_traversed
> FROM SYSTEM.STATS
> WHERE COLUMN_FAMILY = :1
> AND GUIDE_POST_KEY >= :2
> AND GUIDE_POST_KEY < :3
>
> where :1 is the column family, :2 is the start row of the scan, and :3 is
> the stop row of the scan. The result of the query should tell you the
> bytes_traversed and the rows_traversed with a granularity of the
> phoenix.stats.guidepost.width config parameter.
>
> We could even run this across all column families being traversed based on
> the which ones are referenced and projected into the scan. Or we could use
> the "empty column family" (using SchemaUtil.getEmptyColumnFamily() as
> Maryann mentioned) which is the one that is typically projected. FWIW, the
> logic of which guideposts are used by a query is here:
> BaseResultIterators.getGuidePosts().
>
> Make sense? Is that the way it's working? If not, let's file a JIRA please.
>
> Thanks,
> James
>
> On Sat, Feb 13, 2016 at 10:15 AM, Maryann Xue <[email protected]>
> wrote:
>
> > Thank you, Ankit! I see what you mean. But I think what I queried was the
> > default CF. SchemaUtil.getEmptyColumnFamily(), is that correct? I'll try
> to
> > see if I can reproduce this.
> >
> > On Sat, Feb 13, 2016 at 8:07 AM, Ankit Singhal <[email protected]
> >
> > wrote:
> >
> > > Yes James, Query is using guidePosts as per the cf used in filter.
> > > But I think Maryann is expecting that rowcount and bytescount should be
> > > available at each guidePost key level, which we currently don't store.
> > > currently, we can use metrics(like rowcount/bytecount) at cf level only
> > > right?
> > >
> > > On Sat, Feb 13, 2016 at 11:34 AM, James Taylor <[email protected]
> >
> > > wrote:
> > >
> > > > We should have separate guideposts per cf, as the data distribution
> may
> > > be
> > > > different. We use the default cf if it's being filtered on, but
> > otherwise
> > > > use a different cf.
> > > >
> > > > Is that how it works currently, Ankit?
> > > >
> > > > On Friday, February 12, 2016, Ankit Singhal <
> [email protected]>
> > > > wrote:
> > > >
> > > > > but I think we need these metrics at cf only right as per this
> > comment-
> > > > >
> > > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/browse/PHOENIX-2143?focusedCommentId=15069779&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15069779
> > > > >
> > > > >
> > > > > that's why we serialize aggregated value of region at cf level in
> > first
> > > > > guide post only.
> > > > >
> > > > > Regards,
> > > > > Ankit Singhal
> > > > >
> > > > > On Sat, Feb 13, 2016 at 9:07 AM, Maryann Xue <
> [email protected]
> > > > > <javascript:;>> wrote:
> > > > >
> > > > > > Thanks a lot for the answer, James! The data size has well
> exceeded
> > > the
> > > > > > guidepost width and the guideposts do exist but without
> > corresponding
> > > > > > "rowCount" or "byteCount" cell. I'll try doing a Phoenix query
> > > instead
> > > > > and
> > > > > > confirm that it is a bug.
> > > > > >
> > > > > >
> > > > > > Thanks,
> > > > > > Maryann
> > > > > >
> > > > > > On Fri, Feb 12, 2016 at 10:21 PM, James Taylor <
> > > [email protected]
> > > > > <javascript:;>>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Maryann,
> > > > > > > If the amount of data in a region is less than the guidepost
> > width,
> > > > > then
> > > > > > > it's possible you'd get no guideposts for that region. Do you
> > think
> > > > > > that's
> > > > > > > the case? If not, it sound like there may be a bug.
> > > > > > >
> > > > > > > Assuming you're querying to get the stats information, I'd
> > > recommend
> > > > > > doing
> > > > > > > a Phoenix query directly. The code you're emulating uses
> straight
> > > > HBase
> > > > > > > APIs because it's called from the server-side. It'd be a one
> > liner
> > > > as a
> > > > > > > Phoenix query.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > James
> > > > > > >
> > > > > > > On Fri, Feb 12, 2016 at 11:23 AM, Maryann Xue <
> > > [email protected]
> > > > > <javascript:;>>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > This was something I noticed when applying Phoenix table
> stats
> > > into
> > > > > > > > Calcite-Phoenix cost calculation: When executing the
> following
> > > code
> > > > > (a
> > > > > > > > slightly modified version of the existing StatisticsUtil
> > method)
> > > to
> > > > > > scan
> > > > > > > > stats table for a specific column-family and a specific
> > > start/stop
> > > > > key
> > > > > > > > range, I got guidepost rows that did not contain the rowCount
> > or
> > > > > > > byteCount
> > > > > > > > cell, for all rows in the specified range. Apparently, I had
> > set
> > > > the
> > > > > > > > corresponding columns in the Scan (as shown below).
> Meanwhile,
> > > > > another
> > > > > > > > range of stats in the same table gave me the right result. I
> am
> > > > > > wondering
> > > > > > > > if this is an expected behavior or it is a bug?
> > > > > > > >
> > > > > > > >     public static PTableStats readStatistics(HTableInterface
> > > > > > statsHTable,
> > > > > > > >
> > > > > > > >             byte[] tableNameBytes, ImmutableBytesPtr cf,
> byte[]
> > > > > > startKey,
> > > > > > > > byte[] stopKey,
> > > > > > > >
> > > > > > > >             long clientTimeStamp)
> > > > > > > >
> > > > > > > >             throws IOException {
> > > > > > > >
> > > > > > > >         ImmutableBytesWritable ptr = new
> > > ImmutableBytesWritable();
> > > > > > > >
> > > > > > > >         Scan s;
> > > > > > > >
> > > > > > > >         if (cf == null) {
> > > > > > > >
> > > > > > > >             s = MetaDataUtil.newTableRowsScan(tableNameBytes,
> > > > > > > > MetaDataProtocol.MIN_TABLE_TIMESTAMP, clientTimeStamp);
> > > > > > > >
> > > > > > > >         } else {
> > > > > > > >
> > > > > > > >             s =
> > > > > MetaDataUtil.newTableRowsScan(getAdjustedKey(startKey,
> > > > > > > > tableNameBytes, cf, false),
> > > > > > > >
> > > > > > > >                     getAdjustedKey(stopKey, tableNameBytes,
> cf,
> > > > > true),
> > > > > > > > MetaDataProtocol.MIN_TABLE_TIMESTAMP,
> > > > > > > >
> > > > > > > >                     clientTimeStamp);
> > > > > > > >
> > > > > > > >         }
> > > > > > > >
> > > > > > > >
>  s.addColumn(QueryConstants.DEFAULT_COLUMN_FAMILY_BYTES,
> > > > > > > > PhoenixDatabaseMetaData.GUIDE_POSTS_WIDTH_BYTES);
> > > > > > > >
> > > > > > > >
>  s.addColumn(QueryConstants.DEFAULT_COLUMN_FAMILY_BYTES,
> > > > > > > > PhoenixDatabaseMetaData.GUIDE_POSTS_ROW_COUNT_BYTES);
> > > > > > > >
> > > > > > > >
>  s.addColumn(QueryConstants.DEFAULT_COLUMN_FAMILY_BYTES,
> > > > > > > > QueryConstants.EMPTY_COLUMN_BYTES);
> > > > > > > >
> > > > > > > >         ResultScanner scanner = null;
> > > > > > > >
> > > > > > > >         long timeStamp =
> MetaDataProtocol.MIN_TABLE_TIMESTAMP;
> > > > > > > >
> > > > > > > >         TreeMap<byte[], GuidePostsInfoBuilder>
> > > > > > guidePostsInfoWriterPerCf
> > > > > > > =
> > > > > > > > new TreeMap<byte[],
> > > GuidePostsInfoBuilder>(Bytes.BYTES_COMPARATOR);
> > > > > > > >
> > > > > > > >         try {
> > > > > > > >
> > > > > > > >             scanner = statsHTable.getScanner(s);
> > > > > > > >
> > > > > > > >             Result result = null;
> > > > > > > >
> > > > > > > >             while ((result = scanner.next()) != null) {
> > > > > > > >
> > > > > > > >                 CellScanner cellScanner =
> result.cellScanner();
> > > > > > > >
> > > > > > > >                 long rowCount = 0;
> > > > > > > >
> > > > > > > >                 long byteCount = 0;
> > > > > > > >
> > > > > > > >                 byte[] cfName = null;
> > > > > > > >
> > > > > > > >                 int tableNameLength;
> > > > > > > >
> > > > > > > >                 int cfOffset;
> > > > > > > >
> > > > > > > >                 int cfLength;
> > > > > > > >
> > > > > > > >                 boolean valuesSet = false;
> > > > > > > >
> > > > > > > >                 // Only the two cells with quals
> > > > > > > > GUIDE_POSTS_ROW_COUNT_BYTES and GUIDE_POSTS_BYTES would be
> > > > retrieved
> > > > > > > >
> > > > > > > >                 while (cellScanner.advance()) {
> > > > > > > >
> > > > > > > >                     Cell current = cellScanner.current();
> > > > > > > >
> > > > > > > >                     if (!valuesSet) {
> > > > > > > >
> > > > > > > >                         tableNameLength =
> > tableNameBytes.length +
> > > > 1;
> > > > > > > >
> > > > > > > >                         cfOffset = current.getRowOffset() +
> > > > > > > > tableNameLength;
> > > > > > > >
> > > > > > > >                         cfLength =
> > > > > > > getVarCharLength(current.getRowArray(),
> > > > > > > > cfOffset,
> > > > > > > >
> > > > > > > >                                 current.getRowLength() -
> > > > > > > tableNameLength);
> > > > > > > >
> > > > > > > >                         ptr.set(current.getRowArray(),
> > cfOffset,
> > > > > > > cfLength);
> > > > > > > >
> > > > > > > >                         valuesSet = true;
> > > > > > > >
> > > > > > > >                     }
> > > > > > > >
> > > > > > > >                     cfName =
> > > ByteUtil.copyKeyBytesIfNecessary(ptr);
> > > > > > > >
> > > > > > > >                     if
> > (Bytes.equals(current.getQualifierArray(),
> > > > > > current
> > > > > > > > .getQualifierOffset(),
> > > > > > > >
> > > > > > > >                             current.getQualifierLength(),
> > > > > > > > PhoenixDatabaseMetaData.GUIDE_POSTS_ROW_COUNT_BYTES, 0,
> > > > > > > >
> > > > > > > >                             PhoenixDatabaseMetaData.
> > > > > > > > GUIDE_POSTS_ROW_COUNT_BYTES.length)) {
> > > > > > > >
> > > > > > > >                         rowCount =
> > > > > > PLong.INSTANCE.getCodec().decodeLong(
> > > > > > > > current.getValueArray(),
> > > > > > > >
> > > > > > > >                                 current.getValueOffset(),
> > > > > > > > SortOrder.getDefault());
> > > > > > > >
> > > > > > > >                     } else if
> > > > > > (Bytes.equals(current.getQualifierArray(),
> > > > > > > > current.getQualifierOffset(),
> > > > > > > >
> > > > > > > >                             current.getQualifierLength(),
> > > > > > > > PhoenixDatabaseMetaData.GUIDE_POSTS_WIDTH_BYTES, 0,
> > > > > > > >
> > > > > > > >
> > > > > > > > PhoenixDatabaseMetaData.GUIDE_POSTS_WIDTH_BYTES.
> > > > > > > > length)) {
> > > > > > > >
> > > > > > > >                         byteCount =
> > > > > > PLong.INSTANCE.getCodec().decodeLong(
> > > > > > > > current.getValueArray(),
> > > > > > > >
> > > > > > > >                                 current.getValueOffset(),
> > > > > > > > SortOrder.getDefault());
> > > > > > > >
> > > > > > > >                     }
> > > > > > > >
> > > > > > > >                     if (current.getTimestamp() > timeStamp) {
> > > > > > > >
> > > > > > > >                         timeStamp = current.getTimestamp();
> > > > > > > >
> > > > > > > >                     }
> > > > > > > >
> > > > > > > >                 }
> > > > > > > >
> > > > > > > >                 if (cfName != null) {
> > > > > > > >
> > > > > > > >                     byte[] newGPStartKey =
> > > > > getGuidePostsInfoFromRowKey(
> > > > > > > > tableNameBytes, cfName, result.getRow());
> > > > > > > >
> > > > > > > >                     GuidePostsInfoBuilder
> guidePostsInfoWriter
> > =
> > > > > > > > guidePostsInfoWriterPerCf.get(cfName);
> > > > > > > >
> > > > > > > >                     if (guidePostsInfoWriter == null) {
> > > > > > > >
> > > > > > > >                         guidePostsInfoWriter = new
> > > > > > > GuidePostsInfoBuilder();
> > > > > > > >
> > > > > > > >                         guidePostsInfoWriterPerCf.put(cfName,
> > > > > > > > guidePostsInfoWriter);
> > > > > > > >
> > > > > > > >                     }
> > > > > > > >
> > > > > > > >
> > > >  guidePostsInfoWriter.addGuidePosts(newGPStartKey,
> > > > > > > > byteCount, rowCount);
> > > > > > > >
> > > > > > > >                 }
> > > > > > > >
> > > > > > > >             }
> > > > > > > >
> > > > > > > >             if (!guidePostsInfoWriterPerCf.isEmpty()) {
> return
> > > new
> > > > > > > > PTableStatsImpl(
> > > > > > > >
> > > > > > > >
> > >  getGuidePostsPerCf(guidePostsInfoWriterPerCf),
> > > > > > > > timeStamp);
> > > > > > > > }
> > > > > > > >
> > > > > > > >         } finally {
> > > > > > > >
> > > > > > > >             if (scanner != null) {
> > > > > > > >
> > > > > > > >                 scanner.close();
> > > > > > > >
> > > > > > > >             }
> > > > > > > >
> > > > > > > >         }
> > > > > > > >
> > > > > > > >         return PTableStats.EMPTY_STATS;
> > > > > > > >     }
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to