[ 
https://issues.apache.org/jira/browse/HIVE-12010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-12010:
-----------------------------------
    Description: Although fs based collection mechanism is default for last few 
releases, tests still use jdbc for stats collection. The main advantage of fs 
based collection over jdbc based one is the scalability. In jdbc case, a single 
database (normally co-located with the metastore relational database) is used 
to handle all the stats collected by all the tasks. This single database is 
responsible to maintain the consistency for the stats, which will become a 
bottleneck and face scalability issue when the number of tasks is huge. In fs 
case, each task is writing stats into hdfs which does not have scalability 
issue.  (was: Although fs based collection mechanism is default for last few 
releases, tests still use jdbc for stats collection.)

> Tests should use FileSystem based stats collection mechanism
> ------------------------------------------------------------
>
>                 Key: HIVE-12010
>                 URL: https://issues.apache.org/jira/browse/HIVE-12010
>             Project: Hive
>          Issue Type: Task
>          Components: Statistics
>            Reporter: Ashutosh Chauhan
>            Assignee: Ashutosh Chauhan
>         Attachments: HIVE-12010.1.patch, HIVE-12010.2.patch, 
> HIVE-12010.3.patch, HIVE-12010.4.patch, HIVE-12010.patch
>
>
> Although fs based collection mechanism is default for last few releases, 
> tests still use jdbc for stats collection. The main advantage of fs based 
> collection over jdbc based one is the scalability. In jdbc case, a single 
> database (normally co-located with the metastore relational database) is used 
> to handle all the stats collected by all the tasks. This single database is 
> responsible to maintain the consistency for the stats, which will become a 
> bottleneck and face scalability issue when the number of tasks is huge. In fs 
> case, each task is writing stats into hdfs which does not have scalability 
> issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to