[ https://issues.apache.org/jira/browse/PHOENIX-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14240674#comment-14240674 ]
James Taylor commented on PHOENIX-1453: --------------------------------------- Thanks for the patch, [~ramkrishna]. Here's some feedback: - You need a type specification here too (see existing examples): {code} + // We have to handle new columns here + // TODO : what should be time that should be specified here? + addColumnsIfNotExists(metaConnection, + SYSTEM_CATALOG_SCHEMA + ".\"" + SYSTEM_STATS_TABLE, MetaDataProtocol.MIN_SYSTEM_TABLE_TIMESTAMP, + PhoenixDatabaseMetaData.GUIDE_POSTS_ROW_COUNT); {code}' - For the proto change, I think we should define a PTableStats that contains everything stats related. Within that, we may or may not need a further breakdown, but likely not. {code} - GuidePostsInfo info = new GuidePostsInfo(pTableStatsProto.getGuidePostsByteCount(), value); + for (int j = 0; j < pTableStatsProto.getStatsCount(); j++) { + PTableIndividualStats stats = pTableStatsProto.getStats(j); + details.add(new GuidePostDetail(stats.getByteCount(), stats.getGps().toByteArray(), stats.getRowCount())); + } + GuidePostsInfo info = new GuidePostsInfo(pTableStatsProto.getGuidePostsByteCount(), value + ,pTableStatsProto.getRowCount(), details); {code} - The code in GuidePostInfo isn't correct. The combine method needs to create an array for byteSize and rowCount. This is where we're combining info from two different regions. Don't do a merge sort (as is being done now), but instead we'll either concatenate the guidePost keys before the existing ones or after them. The byteSize and rowCount need to be repeated here, again either before or after depending on the guidePost keys. So we'll go from a single rowCount to one per guidepost key here. Same with byteCount. For example, let's say that this.getGuidePosts() has keys ('a','b','c') with rowCount = 10 and byteCount =50 and oldInfo.getGuidePosts() has keys ('j','k','l') with rowCount = 8 and byteCount = 40. Then, when you combine them, you'd end up with ('a','b','c','j','k','l'), rowCount of (10,10,10,8,8,8), and byteCount of (50, 50, 50, 40, 40, 40). You should declare rowCount and byteCount as a long[]. They'll start with a single element and then get multiple elements when combine is called. This way, we're not losing information. An alternative would be to maintain the per region structure and have methods that hide this on the client (as we operate on it in a flattened way on the client). - Not sure I get the addIndividualGuidePostDetails method. Would be good if we didn't need that distinction. > Collect row counts per region in stats table > -------------------------------------------- > > Key: PHOENIX-1453 > URL: https://issues.apache.org/jira/browse/PHOENIX-1453 > Project: Phoenix > Issue Type: Sub-task > Reporter: James Taylor > Assignee: ramkrishna.s.vasudevan > Attachments: Phoenix-1453.patch, Phoenix-1453_1.patch, > Phoenix-1453_2.patch, Phoenix-1453_3.patch, Phoenix-1453_7.patch, > Phoenix-1453_8.patch > > > We currently collect guideposts per equal chunk, but we should also capture > row counts. Should we have a parallel array with the guideposts that count > rows per guidepost, or is it enough to have a per region count? -- This message was sent by Atlassian JIRA (v6.3.4#6332)