Azhar created GRIFFIN-334:
-----------------------------
Summary: CLONE - JDBC Connector: Ability to Select Specific
Columns Instead of All the Columns
Key: GRIFFIN-334
URL: https://issues.apache.org/jira/browse/GRIFFIN-334
Project: Griffin
Issue Type: Improvement
Components: accuracy-batch
Affects Versions: 0.6.0
Reporter: Azhar
*Background:*
Thanks to https://issues.apache.org/jira/browse/GRIFFIN-315, we already have
JDBC connector.
However, currently, it is pulling all the columns using`"SELECT * FROM
$fullTableName"`.
It will cause some issues for larger JDBC tables -
- memory overhead for spark data frame
- longer execution time
- resource overhear for RDBMS
*Proposed Improvement:*
So, I propose the feature to allow JDBC connector to able to select only
required columns.
Example:
We have a rule `"rule":"src.id = tgt.id and src.country = tgt.country "`. Then
we only need two columns `id` and 'country'.
So, in connector we can add additional clause `columns` to select only
required columns, like below:
{code:java}
{ "name":"src",
"connector":{ "type":"jdbc",
"config":{ "database":"mydatabase",
"tablename":"mytable",
"columns":"id, country",
"url":"jdbc:sqlserver://myhost:1433;databaseName=mydatabase",
"user":"user",
"password":"password",
"driver":"com.microsoft.sqlserver.jdbc.SQLServerDriver",
"where":""
}
}
}
{code}
We can implement it like this, if there is `columns` clause then use it
otherwise use `*` as default.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)