[ https://issues.apache.org/jira/browse/AIRFLOW-243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chris Riccomini updated AIRFLOW-243: ------------------------------------ Affects Version/s: (was: Airflow 2.0) Airflow 1.7.1.3 > Use a more efficient Thrift call for HivePartitionSensor > -------------------------------------------------------- > > Key: AIRFLOW-243 > URL: https://issues.apache.org/jira/browse/AIRFLOW-243 > Project: Apache Airflow > Issue Type: Improvement > Components: operators > Affects Versions: Airflow 1.7.1.3 > Reporter: Paul Yang > Assignee: Li Xuanji > Priority: Minor > Fix For: Airflow 1.8 > > > The {{HivePartitionSesnor}} uses the `get_partitions_by_filter` Thrift call > that can result in some expensive SQL queries for tables that have many > partitions and are partitioned by multiple keys. We've seen our metastore DB > get hammered by these sensors resulting in service degradation for other > metastore users. > The {{MetastorePartitionSensor}} is efficient, but it can result in too many > connections to the metastore DB. > An alternative is to use the `get_partition_by_name` Thrift call that > translates into more efficient SQL queries. Because connections will be > pooled on the Thrift server, the DB won't get overloaded as with the > {{MetastorePartitionSensor}}. The semantics of the arguments will change, so > either a new argument needs to be introduced, or a new operator needs to be > created. -- This message was sent by Atlassian JIRA (v6.3.4#6332)