[
https://issues.apache.org/jira/browse/HIVE-11683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Eugene Koifman updated HIVE-11683:
----------------------------------
Component/s: Metastore
> Hive Streaming may overload the metastore
> -----------------------------------------
>
> Key: HIVE-11683
> URL: https://issues.apache.org/jira/browse/HIVE-11683
> Project: Hive
> Issue Type: Bug
> Components: HCatalog, Hive, Metastore, Transactions
> Affects Versions: 1.0.0
> Reporter: Eugene Koifman
> Assignee: Roshan Naik
>
> HiveEndPoint represents a way to write to a specific partition
> transactionally.
> Each HiveEndPoint creates TransactionBatch(es) and commits transactions.
> Suppose you have 10 instances of Storm Hive bolt using Streaming API.
> Each instance will create HiveEndPoints on demand when it sees an event for
> particular partition value.
> If events are uniformly distributed wrt partition values and the table has
> 1000 partitions (for example it's partitioned by CustomerId), each of 10 bolt
> instances may create 1000 HiveEndPoints and thus > 10,000 (actually 10K *
> num_txn_per_batch) concurrent transactions.
> This creates huge amount of Metastore traffic.
> HIVE-11672 is investigating how some sort of "shuffle" phase can be added
> route events for a particular bucket to the same bolt instance.
> The same idea should explored to route events based on partition value.
> cc [~alangates],[~sriharsha],[~rbains]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)