I think then we need to store rowCount and byteCount at guidePost level. so I have created a Jira(PHOENIX-2683) and uploaded a patch for the same.
On Sat, Feb 13, 2016 at 11:50 PM, James Taylor <[email protected]> wrote: > The GUIDE_POSTS_WIDTH and GUIDE_POSTS_ROW_COUNT should contain the number > of bytes and number of rows which were traversed since the last guidepost. > So given some start key and stop key from a scan and knowledge that a given > column family is used in a query, you should be able to run a query like > this: > > SELECT SUM(GUIDE_POSTS_WIDTH) bytes_traversed, > SUM(GUIDE_POSTS_ROW_COUNT) rows_traversed > FROM SYSTEM.STATS > WHERE COLUMN_FAMILY = :1 > AND GUIDE_POST_KEY >= :2 > AND GUIDE_POST_KEY < :3 > > where :1 is the column family, :2 is the start row of the scan, and :3 is > the stop row of the scan. The result of the query should tell you the > bytes_traversed and the rows_traversed with a granularity of the > phoenix.stats.guidepost.width config parameter. > > We could even run this across all column families being traversed based on > the which ones are referenced and projected into the scan. Or we could use > the "empty column family" (using SchemaUtil.getEmptyColumnFamily() as > Maryann mentioned) which is the one that is typically projected. FWIW, the > logic of which guideposts are used by a query is here: > BaseResultIterators.getGuidePosts(). > > Make sense? Is that the way it's working? If not, let's file a JIRA please. > > Thanks, > James > > On Sat, Feb 13, 2016 at 10:15 AM, Maryann Xue <[email protected]> > wrote: > > > Thank you, Ankit! I see what you mean. But I think what I queried was the > > default CF. SchemaUtil.getEmptyColumnFamily(), is that correct? I'll try > to > > see if I can reproduce this. > > > > On Sat, Feb 13, 2016 at 8:07 AM, Ankit Singhal <[email protected] > > > > wrote: > > > > > Yes James, Query is using guidePosts as per the cf used in filter. > > > But I think Maryann is expecting that rowcount and bytescount should be > > > available at each guidePost key level, which we currently don't store. > > > currently, we can use metrics(like rowcount/bytecount) at cf level only > > > right? > > > > > > On Sat, Feb 13, 2016 at 11:34 AM, James Taylor <[email protected] > > > > > wrote: > > > > > > > We should have separate guideposts per cf, as the data distribution > may > > > be > > > > different. We use the default cf if it's being filtered on, but > > otherwise > > > > use a different cf. > > > > > > > > Is that how it works currently, Ankit? > > > > > > > > On Friday, February 12, 2016, Ankit Singhal < > [email protected]> > > > > wrote: > > > > > > > > > but I think we need these metrics at cf only right as per this > > comment- > > > > > > > > > > > > > > > > > > > > https://issues.apache.org/jira/browse/PHOENIX-2143?focusedCommentId=15069779&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15069779 > > > > > > > > > > > > > > > that's why we serialize aggregated value of region at cf level in > > first > > > > > guide post only. > > > > > > > > > > Regards, > > > > > Ankit Singhal > > > > > > > > > > On Sat, Feb 13, 2016 at 9:07 AM, Maryann Xue < > [email protected] > > > > > <javascript:;>> wrote: > > > > > > > > > > > Thanks a lot for the answer, James! The data size has well > exceeded > > > the > > > > > > guidepost width and the guideposts do exist but without > > corresponding > > > > > > "rowCount" or "byteCount" cell. I'll try doing a Phoenix query > > > instead > > > > > and > > > > > > confirm that it is a bug. > > > > > > > > > > > > > > > > > > Thanks, > > > > > > Maryann > > > > > > > > > > > > On Fri, Feb 12, 2016 at 10:21 PM, James Taylor < > > > [email protected] > > > > > <javascript:;>> > > > > > > wrote: > > > > > > > > > > > > > Hi Maryann, > > > > > > > If the amount of data in a region is less than the guidepost > > width, > > > > > then > > > > > > > it's possible you'd get no guideposts for that region. Do you > > think > > > > > > that's > > > > > > > the case? If not, it sound like there may be a bug. > > > > > > > > > > > > > > Assuming you're querying to get the stats information, I'd > > > recommend > > > > > > doing > > > > > > > a Phoenix query directly. The code you're emulating uses > straight > > > > HBase > > > > > > > APIs because it's called from the server-side. It'd be a one > > liner > > > > as a > > > > > > > Phoenix query. > > > > > > > > > > > > > > Thanks, > > > > > > > James > > > > > > > > > > > > > > On Fri, Feb 12, 2016 at 11:23 AM, Maryann Xue < > > > [email protected] > > > > > <javascript:;>> > > > > > > > wrote: > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > This was something I noticed when applying Phoenix table > stats > > > into > > > > > > > > Calcite-Phoenix cost calculation: When executing the > following > > > code > > > > > (a > > > > > > > > slightly modified version of the existing StatisticsUtil > > method) > > > to > > > > > > scan > > > > > > > > stats table for a specific column-family and a specific > > > start/stop > > > > > key > > > > > > > > range, I got guidepost rows that did not contain the rowCount > > or > > > > > > > byteCount > > > > > > > > cell, for all rows in the specified range. Apparently, I had > > set > > > > the > > > > > > > > corresponding columns in the Scan (as shown below). > Meanwhile, > > > > > another > > > > > > > > range of stats in the same table gave me the right result. I > am > > > > > > wondering > > > > > > > > if this is an expected behavior or it is a bug? > > > > > > > > > > > > > > > > public static PTableStats readStatistics(HTableInterface > > > > > > statsHTable, > > > > > > > > > > > > > > > > byte[] tableNameBytes, ImmutableBytesPtr cf, > byte[] > > > > > > startKey, > > > > > > > > byte[] stopKey, > > > > > > > > > > > > > > > > long clientTimeStamp) > > > > > > > > > > > > > > > > throws IOException { > > > > > > > > > > > > > > > > ImmutableBytesWritable ptr = new > > > ImmutableBytesWritable(); > > > > > > > > > > > > > > > > Scan s; > > > > > > > > > > > > > > > > if (cf == null) { > > > > > > > > > > > > > > > > s = MetaDataUtil.newTableRowsScan(tableNameBytes, > > > > > > > > MetaDataProtocol.MIN_TABLE_TIMESTAMP, clientTimeStamp); > > > > > > > > > > > > > > > > } else { > > > > > > > > > > > > > > > > s = > > > > > MetaDataUtil.newTableRowsScan(getAdjustedKey(startKey, > > > > > > > > tableNameBytes, cf, false), > > > > > > > > > > > > > > > > getAdjustedKey(stopKey, tableNameBytes, > cf, > > > > > true), > > > > > > > > MetaDataProtocol.MIN_TABLE_TIMESTAMP, > > > > > > > > > > > > > > > > clientTimeStamp); > > > > > > > > > > > > > > > > } > > > > > > > > > > > > > > > > > s.addColumn(QueryConstants.DEFAULT_COLUMN_FAMILY_BYTES, > > > > > > > > PhoenixDatabaseMetaData.GUIDE_POSTS_WIDTH_BYTES); > > > > > > > > > > > > > > > > > s.addColumn(QueryConstants.DEFAULT_COLUMN_FAMILY_BYTES, > > > > > > > > PhoenixDatabaseMetaData.GUIDE_POSTS_ROW_COUNT_BYTES); > > > > > > > > > > > > > > > > > s.addColumn(QueryConstants.DEFAULT_COLUMN_FAMILY_BYTES, > > > > > > > > QueryConstants.EMPTY_COLUMN_BYTES); > > > > > > > > > > > > > > > > ResultScanner scanner = null; > > > > > > > > > > > > > > > > long timeStamp = > MetaDataProtocol.MIN_TABLE_TIMESTAMP; > > > > > > > > > > > > > > > > TreeMap<byte[], GuidePostsInfoBuilder> > > > > > > guidePostsInfoWriterPerCf > > > > > > > = > > > > > > > > new TreeMap<byte[], > > > GuidePostsInfoBuilder>(Bytes.BYTES_COMPARATOR); > > > > > > > > > > > > > > > > try { > > > > > > > > > > > > > > > > scanner = statsHTable.getScanner(s); > > > > > > > > > > > > > > > > Result result = null; > > > > > > > > > > > > > > > > while ((result = scanner.next()) != null) { > > > > > > > > > > > > > > > > CellScanner cellScanner = > result.cellScanner(); > > > > > > > > > > > > > > > > long rowCount = 0; > > > > > > > > > > > > > > > > long byteCount = 0; > > > > > > > > > > > > > > > > byte[] cfName = null; > > > > > > > > > > > > > > > > int tableNameLength; > > > > > > > > > > > > > > > > int cfOffset; > > > > > > > > > > > > > > > > int cfLength; > > > > > > > > > > > > > > > > boolean valuesSet = false; > > > > > > > > > > > > > > > > // Only the two cells with quals > > > > > > > > GUIDE_POSTS_ROW_COUNT_BYTES and GUIDE_POSTS_BYTES would be > > > > retrieved > > > > > > > > > > > > > > > > while (cellScanner.advance()) { > > > > > > > > > > > > > > > > Cell current = cellScanner.current(); > > > > > > > > > > > > > > > > if (!valuesSet) { > > > > > > > > > > > > > > > > tableNameLength = > > tableNameBytes.length + > > > > 1; > > > > > > > > > > > > > > > > cfOffset = current.getRowOffset() + > > > > > > > > tableNameLength; > > > > > > > > > > > > > > > > cfLength = > > > > > > > getVarCharLength(current.getRowArray(), > > > > > > > > cfOffset, > > > > > > > > > > > > > > > > current.getRowLength() - > > > > > > > tableNameLength); > > > > > > > > > > > > > > > > ptr.set(current.getRowArray(), > > cfOffset, > > > > > > > cfLength); > > > > > > > > > > > > > > > > valuesSet = true; > > > > > > > > > > > > > > > > } > > > > > > > > > > > > > > > > cfName = > > > ByteUtil.copyKeyBytesIfNecessary(ptr); > > > > > > > > > > > > > > > > if > > (Bytes.equals(current.getQualifierArray(), > > > > > > current > > > > > > > > .getQualifierOffset(), > > > > > > > > > > > > > > > > current.getQualifierLength(), > > > > > > > > PhoenixDatabaseMetaData.GUIDE_POSTS_ROW_COUNT_BYTES, 0, > > > > > > > > > > > > > > > > PhoenixDatabaseMetaData. > > > > > > > > GUIDE_POSTS_ROW_COUNT_BYTES.length)) { > > > > > > > > > > > > > > > > rowCount = > > > > > > PLong.INSTANCE.getCodec().decodeLong( > > > > > > > > current.getValueArray(), > > > > > > > > > > > > > > > > current.getValueOffset(), > > > > > > > > SortOrder.getDefault()); > > > > > > > > > > > > > > > > } else if > > > > > > (Bytes.equals(current.getQualifierArray(), > > > > > > > > current.getQualifierOffset(), > > > > > > > > > > > > > > > > current.getQualifierLength(), > > > > > > > > PhoenixDatabaseMetaData.GUIDE_POSTS_WIDTH_BYTES, 0, > > > > > > > > > > > > > > > > > > > > > > > > PhoenixDatabaseMetaData.GUIDE_POSTS_WIDTH_BYTES. > > > > > > > > length)) { > > > > > > > > > > > > > > > > byteCount = > > > > > > PLong.INSTANCE.getCodec().decodeLong( > > > > > > > > current.getValueArray(), > > > > > > > > > > > > > > > > current.getValueOffset(), > > > > > > > > SortOrder.getDefault()); > > > > > > > > > > > > > > > > } > > > > > > > > > > > > > > > > if (current.getTimestamp() > timeStamp) { > > > > > > > > > > > > > > > > timeStamp = current.getTimestamp(); > > > > > > > > > > > > > > > > } > > > > > > > > > > > > > > > > } > > > > > > > > > > > > > > > > if (cfName != null) { > > > > > > > > > > > > > > > > byte[] newGPStartKey = > > > > > getGuidePostsInfoFromRowKey( > > > > > > > > tableNameBytes, cfName, result.getRow()); > > > > > > > > > > > > > > > > GuidePostsInfoBuilder > guidePostsInfoWriter > > = > > > > > > > > guidePostsInfoWriterPerCf.get(cfName); > > > > > > > > > > > > > > > > if (guidePostsInfoWriter == null) { > > > > > > > > > > > > > > > > guidePostsInfoWriter = new > > > > > > > GuidePostsInfoBuilder(); > > > > > > > > > > > > > > > > guidePostsInfoWriterPerCf.put(cfName, > > > > > > > > guidePostsInfoWriter); > > > > > > > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > guidePostsInfoWriter.addGuidePosts(newGPStartKey, > > > > > > > > byteCount, rowCount); > > > > > > > > > > > > > > > > } > > > > > > > > > > > > > > > > } > > > > > > > > > > > > > > > > if (!guidePostsInfoWriterPerCf.isEmpty()) { > return > > > new > > > > > > > > PTableStatsImpl( > > > > > > > > > > > > > > > > > > > getGuidePostsPerCf(guidePostsInfoWriterPerCf), > > > > > > > > timeStamp); > > > > > > > > } > > > > > > > > > > > > > > > > } finally { > > > > > > > > > > > > > > > > if (scanner != null) { > > > > > > > > > > > > > > > > scanner.close(); > > > > > > > > > > > > > > > > } > > > > > > > > > > > > > > > > } > > > > > > > > > > > > > > > > return PTableStats.EMPTY_STATS; > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
