[
https://issues.apache.org/jira/browse/PHOENIX-2702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15167646#comment-15167646
]
Nick Dimiduk commented on PHOENIX-2702:
---------------------------------------
This may be misleading information. Comparing the output of Lars's query above
vs the output of {{hdfs dfs -du -h -s}}, there's quite a large discrepancy --
query reports roughly 3-4x what HDFS reports, at least assuming stats table is
storing bytes in the {{guide_posts_width}} column. My tables are created with
Phoenix defaults, meaning no compression and FAST_DIFF codec. If stats were out
of date, I would expect the numbers to diverge but in the other direction.
Ideas?
> Show estimate rows and bytes touched in explain plan.
> -----------------------------------------------------
>
> Key: PHOENIX-2702
> URL: https://issues.apache.org/jira/browse/PHOENIX-2702
> Project: Phoenix
> Issue Type: Bug
> Reporter: Lars Hofhansl
> Assignee: Lars Hofhansl
> Fix For: 4.7.0
>
> Attachments: PHOENIX-2702.txt
>
>
> We can already estimate the size of a table (both rows and uncompressed
> bytes) with q query like this:
> {code}
> SELECT physical_name AS table_name, SUM(guide_posts_row_count) AS est_rows,
> SUM(guide_posts_width) AS est_size from SYSTEM.STATS GROUP BY physical_name;
> {code}
> During the planning phase we have more information, though. So we can report
> the actual numbers for a query during an explain since we have that info
> there anyway (we filtered the guidepost already with the key info provided in
> the query).
> I might whip up a quick patch for this.
> (Could also go further and add a est_count, est_size UDF for this, but that
> would be a bit harder to get hooked up at the right places, I think, and the
> meaning would be ambiguous)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)