[
https://issues.apache.org/jira/browse/HIVE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jesus Camacho Rodriguez resolved HIVE-14468.
--------------------------------------------
Resolution: Fixed
Fix Version/s: 2.2.0
Pushed in HIVE-14217.
> Implement Druid query based input format
> ----------------------------------------
>
> Key: HIVE-14468
> URL: https://issues.apache.org/jira/browse/HIVE-14468
> Project: Hive
> Issue Type: Sub-task
> Components: Druid integration
> Affects Versions: 2.2.0
> Reporter: Jesus Camacho Rodriguez
> Assignee: Jesus Camacho Rodriguez
> Fix For: 2.2.0
>
>
> It is responsible of generating the splits and creating the record readers.
> * For *Timeseries*, *TopN*, *GroupBy* queries. Create a single split
> containing the broker address and the query. Then the record reader will
> submit the query to the broker, retrieve the results, and parse them and
> generate records.
> * For *Select* queries. Druid has the concept of threshold (limit) in Select
> query. In fact, it is used for retrieving the query results in multiple
> requests. Hence, we will emit a Druid Segment Metadata query to obtain the
> number of rows in the datasource. Then we create _number of rows /
> default\_threshold_ splits; _default\_threshold_ is a Hive configuration
> property defined as {{hive.druid.select.threshold}}. Each split generated
> contains the broker address and a Select JSON query with _start_ and _end_
> date range (currently we assume uniform distribution of records across the
> time dimension). The splits are handled independently by the record readers,
> which submit the query to the broker, retrieve the results, and parse them
> and generate records. This way we can parallelize the retrieval of results
> for these queries.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)