[
https://issues.apache.org/jira/browse/HAWQ-191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15051827#comment-15051827
]
ASF GitHub Bot commented on HAWQ-191:
-------------------------------------
Github user hornn commented on the pull request:
https://github.com/apache/incubator-hawq/pull/174#issuecomment-163783470
Removed code looks good.
+0.5 (waiting for the warning message code)
> Remove Analyzer from PXF
> ------------------------
>
> Key: HAWQ-191
> URL: https://issues.apache.org/jira/browse/HAWQ-191
> Project: Apache HAWQ
> Issue Type: Improvement
> Components: PXF
> Reporter: Noa Horn
> Assignee: Shivram Mani
>
> Analyzer plugin was used to gather statistics when running ANALYZE.
> The API provides one function getEstimatedStats() which returns the estimated
> number of tuples, blocks and the size of block.
> We also have one implementation for it - HdfsAnalyzer.
> After the introduction of advanced stats (HAWQ-44), the Analyzer is no longer
> used by HAWQ. Instead a new function in the Fragmenter API
> (getFragmentsStats) is used to gather initial statistics for the data source,
> and further queries gather sampling tuples for that data source.
> The advantage in the new approach is that the Fragmenter.getFragmentsStats
> uses only the Fragmenter to gather stats. The Analyzer, on the other hand,
> instantiated both Fragmenter and Accessor of the table in order to estimate
> the number of tuples. In the HdfsAnalyzer implementation, it caused a
> dependency of pxf-hdfs jar on pxf-service (which takes care of instantiating
> the plugins), which is contrary to the isolation we want to achieve between
> core functionality (pxf-service) and the plugins (pxf-hdfs, pxf-hive,
> pxf-hbase, etc.)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)