[jira] [Commented] (PHOENIX-1453) Collect row counts per region in stats table

James Taylor (JIRA) Tue, 09 Dec 2014 21:28:14 -0800

    [ 
https://issues.apache.org/jira/browse/PHOENIX-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14240674#comment-14240674
 ]


James Taylor commented on PHOENIX-1453:
---------------------------------------

Thanks for the patch, [~ramkrishna]. Here's some feedback:
- You need a type specification here too (see existing examples):
{code}
+                                // We have to handle new columns here
+                                // TODO : what should be time that should be 
specified here?
+                                addColumnsIfNotExists(metaConnection, 
+                                        SYSTEM_CATALOG_SCHEMA + ".\"" + 
SYSTEM_STATS_TABLE, MetaDataProtocol.MIN_SYSTEM_TABLE_TIMESTAMP, 
+                                        
PhoenixDatabaseMetaData.GUIDE_POSTS_ROW_COUNT);
{code}'
- For the proto change, I think we should define a PTableStats that contains 
everything stats related. Within that, we may or may not need a further 
breakdown, but likely not.
{code}
-            GuidePostsInfo info = new 
GuidePostsInfo(pTableStatsProto.getGuidePostsByteCount(), value);
+            for (int j = 0; j < pTableStatsProto.getStatsCount(); j++) {
+                PTableIndividualStats stats = pTableStatsProto.getStats(j);
+                details.add(new GuidePostDetail(stats.getByteCount(), 
stats.getGps().toByteArray(), stats.getRowCount()));
+            }
+            GuidePostsInfo info = new 
GuidePostsInfo(pTableStatsProto.getGuidePostsByteCount(), value
+                    ,pTableStatsProto.getRowCount(), details);
{code}
- The code in GuidePostInfo isn't correct. The combine method needs to create 
an array for byteSize and rowCount. This is where we're combining info from two 
different regions. Don't do a merge sort (as is being done now), but instead 
we'll either concatenate the guidePost keys before the existing ones or after 
them. The byteSize and rowCount need to be repeated here, again either before 
or after depending on the guidePost keys. So we'll go from a single rowCount to 
one per guidepost key here. Same with byteCount. For example, let's say that 
this.getGuidePosts() has keys ('a','b','c') with rowCount = 10 and byteCount 
=50 and oldInfo.getGuidePosts() has keys ('j','k','l') with rowCount = 8 and 
byteCount = 40. Then, when you combine them, you'd end up with 
('a','b','c','j','k','l'), rowCount of (10,10,10,8,8,8), and byteCount of (50, 
50, 50, 40, 40, 40). You should declare rowCount and byteCount as a long[]. 
They'll start with a single element and then get multiple elements when combine 
is called. This way, we're not losing information. An alternative would be to 
maintain the per region structure and have methods that hide this on the client 
(as we operate on it in a flattened way on the client).
- Not sure I get the addIndividualGuidePostDetails method. Would be good if we 
didn't need that distinction.

> Collect row counts per region in stats table
> --------------------------------------------
>
>                 Key: PHOENIX-1453
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1453
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: James Taylor
>            Assignee: ramkrishna.s.vasudevan
>         Attachments: Phoenix-1453.patch, Phoenix-1453_1.patch, 
> Phoenix-1453_2.patch, Phoenix-1453_3.patch, Phoenix-1453_7.patch, 
> Phoenix-1453_8.patch
>
>
> We currently collect guideposts per equal chunk, but we should also capture 
> row counts. Should we have a parallel array with the guideposts that count 
> rows per guidepost, or is it enough to have a per region count?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-1453) Collect row counts per region in stats table

Reply via email to