[
https://issues.apache.org/jira/browse/GRIFFIN-334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Azhar updated GRIFFIN-334:
--------------------------
Description:
*Background:*
Refer to https://issues.apache.org/jira/browse/GRIFFIN-332 , we would like
same feature for Hive as well.
Currently it is pulling all the columns `"SELECT * FROM $fullTableName"`.
It will cause some issues for larger Hive tables –
- memory overhead for spark dataframe
- longer execution time
*Proposed Feature:*
So, I propose the feature to allow Hive connector to be able to select only
required columns.
*Example:*
We have a rule `"rule":"src.id = tgt.id and src.country = tgt.country "`. Then
we only need two columns `id` and 'country'.
So, in connector we can add additional key word `columns` to select only
required columns, like below:
{code:java}
{
"name":"src",
"connector":{
"type":"hive",
"config":{
"database":"mydatabase",
"table.name":"mytable",
"columns": "id, country",
"where":""
}
}
}
{code}
We can implement it like this, if there is `columns` clause then use it
otherwise use `*` as default.
was:
*Background:*
Refer to https://issues.apache.org/jira/browse/GRIFFIN-332 , we would like
same feature for Hive as well.
Currently it is pulling all the columns `"SELECT * FROM $fullTableName"`.
It will cause some issues for larger Hive tables --
- memory overhead for spark dataframe
- longer execution time
*Proposed Feature:*
So, I propose the feature to allow Hive connector to able to select only
required columns.
*Example:*
We have a rule `"rule":"src.id = tgt.id and src.country = tgt.country "`. Then
we only need two columns `id` and 'country'.
So, in connector we can add additional key word `columns` to select only
required columns, like below:
{code:java}
{
"name":"src",
"connector":{
"type":"hive",
"config":{
"database":"mydatabase",
"table.name":"mytable",
"columns": "id, country",
"where":""
}
}
}
{code}
We can implement it like this, if there is `columns` clause then use it
otherwise use `*` as default.
Summary: Hive Connector: Ability to Select Specific Columns Instead of
All the Columns (was: HIve Connector: Ability to Select Specific Columns
Instead of All the Columns)
> Hive Connector: Ability to Select Specific Columns Instead of All the Columns
> -----------------------------------------------------------------------------
>
> Key: GRIFFIN-334
> URL: https://issues.apache.org/jira/browse/GRIFFIN-334
> Project: Griffin
> Issue Type: Improvement
> Components: accuracy-batch
> Affects Versions: 0.6.0
> Reporter: Azhar
> Priority: Major
> Labels: columns, hive
>
> *Background:*
> Refer to https://issues.apache.org/jira/browse/GRIFFIN-332 , we would like
> same feature for Hive as well.
> Currently it is pulling all the columns `"SELECT * FROM $fullTableName"`.
> It will cause some issues for larger Hive tables –
> - memory overhead for spark dataframe
> - longer execution time
> *Proposed Feature:*
> So, I propose the feature to allow Hive connector to be able to select only
> required columns.
> *Example:*
> We have a rule `"rule":"src.id = tgt.id and src.country = tgt.country "`.
> Then we only need two columns `id` and 'country'.
> So, in connector we can add additional key word `columns` to select only
> required columns, like below:
> {code:java}
> {
> "name":"src",
> "connector":{
> "type":"hive",
> "config":{
> "database":"mydatabase",
> "table.name":"mytable",
> "columns": "id, country",
> "where":""
> }
> }
> }
> {code}
> We can implement it like this, if there is `columns` clause then use it
> otherwise use `*` as default.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)