[ 
https://issues.apache.org/jira/browse/PHOENIX-2702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15167646#comment-15167646
 ] 

Nick Dimiduk commented on PHOENIX-2702:
---------------------------------------

This may be misleading information. Comparing the output of Lars's query above 
vs the output of {{hdfs dfs -du -h -s}}, there's quite a large discrepancy -- 
query reports roughly 3-4x what HDFS reports, at least assuming stats table is 
storing bytes in the {{guide_posts_width}} column. My tables are created with 
Phoenix defaults, meaning no compression and FAST_DIFF codec. If stats were out 
of date, I would expect the numbers to diverge but in the other direction. 
Ideas?

> Show estimate rows and bytes touched in explain plan.
> -----------------------------------------------------
>
>                 Key: PHOENIX-2702
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2702
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 4.7.0
>
>         Attachments: PHOENIX-2702.txt
>
>
> We can already estimate the size of a table (both rows and uncompressed 
> bytes) with q query like this:
> {code}
> SELECT physical_name AS table_name, SUM(guide_posts_row_count) AS est_rows, 
> SUM(guide_posts_width) AS est_size from SYSTEM.STATS GROUP BY physical_name;
> {code}
> During the planning phase we have more information, though. So we can report 
> the actual numbers for a query during an explain since we have that info 
> there anyway (we filtered the guidepost already with the key info provided in 
> the query).
> I might whip up a quick patch for this.
> (Could also go further and add a est_count, est_size UDF for this, but that 
> would be a bit harder to get hooked up at the right places, I think, and the 
> meaning would be ambiguous)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to