[jira] [Updated] (HIVE-15277) Teach Hive how to create/delete Druid segments

Jesus Camacho Rodriguez (JIRA) Mon, 28 Nov 2016 06:24:19 -0800

     [ 
https://issues.apache.org/jira/browse/HIVE-15277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jesus Camacho Rodriguez updated HIVE-15277:
-------------------------------------------
    Description: 
We want to extend the DruidStorageHandler to support CTAS queries.
In this implementation Hive will generate druid segment files and insert the 
metadata to signal the handoff to druid.

The syntax will be as follows:
{code:sql}
CREATE TABLE druid_table_1
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
TBLPROPERTIES ("druid.datasource" = "datasourcename")
AS <select `timecolumn` as `___time`, `dimension1`,`dimension2`,  `metric1`, 
`metric2`....>;
{code}

This statement stores the results of query <input_query> in a Druid datasource 
named 'datasourcename'. One of the columns of the query needs to be the time 
dimension, which is mandatory in Druid. In particular, we use the same 
convention that it is used for Druid: there needs to be a the column named 
'__time' in the result of the executed query, which will act as the time 
dimension column in Druid. Currently, the time column dimension needs to be a 
'timestamp' type column.
metrics can be of type long, double and float while dimensions are strings. 
Keep in mind that druid has a clear separation between dimensions and metrics, 
therefore if you have a column in hive that contains number and need to be 
presented as dimension use the cast operator to cast as string. 
This initial implementation interacts with Druid Meta data storage to 
add/remove the table in druid, user need to supply the meta data config as 
--hiveconf hive.druid.metadata.password=XXX --hiveconf 
hive.druid.metadata.username=druid --hiveconf 
hive.druid.metadata.uri=jdbc:mysql://host/druid

  was:


We want to extend the DruidStorageHandler to support CTAS queries.
In this implementation Hive will generate druid segment files and insert the 
metadata to signal the handoff to druid.

The syntax will be as follows:

CREATE TABLE druid_table_1
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
TBLPROPERTIES ("druid.datasource" = "datasourcename")
AS <select `timecolumn` as `___time`, `dimension1`,`dimension2`,  `metric1`, 
`metric2`....>;

This statement stores the results of query <input_query> in a Druid datasource 
named 'datasourcename'. One of the columns of the query needs to be the time 
dimension, which is mandatory in Druid. In particular, we use the same 
convention that it is used for Druid: there needs to be a the column named 
'__time' in the result of the executed query, which will act as the time 
dimension column in Druid. Currently, the time column dimension needs to be a 
'timestamp' type column.
metrics can be of type long, double and float while dimensions are strings. 
Keep in mind that druid has a clear separation between dimensions and metrics, 
therefore if you have a column in hive that contains number and need to be 
presented as dimension use the cast operator to cast as string. 
This initial implementation interacts with Druid Meta data storage to 
add/remove the table in druid, user need to supply the meta data config as 
--hiveconf hive.druid.metadata.password=XXX --hiveconf 
hive.druid.metadata.username=druid --hiveconf 
hive.druid.metadata.uri=jdbc:mysql://host/druid


> Teach Hive how to create/delete Druid segments 
> -----------------------------------------------
>
>                 Key: HIVE-15277
>                 URL: https://issues.apache.org/jira/browse/HIVE-15277
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Druid integration
>    Affects Versions: 2.2.0
>            Reporter: slim bouguerra
>            Assignee: slim bouguerra
>         Attachments: file.patch
>
>
> We want to extend the DruidStorageHandler to support CTAS queries.
> In this implementation Hive will generate druid segment files and insert the 
> metadata to signal the handoff to druid.
> The syntax will be as follows:
> {code:sql}
> CREATE TABLE druid_table_1
> STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
> TBLPROPERTIES ("druid.datasource" = "datasourcename")
> AS <select `timecolumn` as `___time`, `dimension1`,`dimension2`,  `metric1`, 
> `metric2`....>;
> {code}
> This statement stores the results of query <input_query> in a Druid 
> datasource named 'datasourcename'. One of the columns of the query needs to 
> be the time dimension, which is mandatory in Druid. In particular, we use the 
> same convention that it is used for Druid: there needs to be a the column 
> named '__time' in the result of the executed query, which will act as the 
> time dimension column in Druid. Currently, the time column dimension needs to 
> be a 'timestamp' type column.
> metrics can be of type long, double and float while dimensions are strings. 
> Keep in mind that druid has a clear separation between dimensions and 
> metrics, therefore if you have a column in hive that contains number and need 
> to be presented as dimension use the cast operator to cast as string. 
> This initial implementation interacts with Druid Meta data storage to 
> add/remove the table in druid, user need to supply the meta data config as 
> --hiveconf hive.druid.metadata.password=XXX --hiveconf 
> hive.druid.metadata.username=druid --hiveconf 
> hive.druid.metadata.uri=jdbc:mysql://host/druid



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-15277) Teach Hive how to create/delete Druid segments

Reply via email to