[ 
https://issues.apache.org/jira/browse/AIRFLOW-243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Riccomini updated AIRFLOW-243:
------------------------------------
    Affects Version/s:     (was: Airflow 2.0)
                       Airflow 1.7.1.3

> Use a more efficient Thrift call for HivePartitionSensor
> --------------------------------------------------------
>
>                 Key: AIRFLOW-243
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-243
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: operators
>    Affects Versions: Airflow 1.7.1.3
>            Reporter: Paul Yang
>            Assignee: Li Xuanji
>            Priority: Minor
>             Fix For: Airflow 1.8
>
>
> The {{HivePartitionSesnor}} uses the `get_partitions_by_filter` Thrift call 
> that can result in some expensive SQL queries for tables that have many 
> partitions and are partitioned by multiple keys. We've seen our metastore DB 
> get hammered by these sensors resulting in service degradation for other 
> metastore users.
> The {{MetastorePartitionSensor}} is efficient, but it can result in too many 
> connections to the metastore DB.
> An alternative is to use the `get_partition_by_name` Thrift call that 
> translates into more efficient SQL queries. Because connections will be 
> pooled on the Thrift server, the DB won't get overloaded as with the 
> {{MetastorePartitionSensor}}. The semantics of the arguments will change, so 
> either a new argument needs to be introduced, or a new operator needs to be 
> created.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to