Azhar created GRIFFIN-334:
-----------------------------

             Summary: CLONE - JDBC Connector: Ability to Select Specific 
Columns Instead of All the Columns
                 Key: GRIFFIN-334
                 URL: https://issues.apache.org/jira/browse/GRIFFIN-334
             Project: Griffin
          Issue Type: Improvement
          Components: accuracy-batch
    Affects Versions: 0.6.0
            Reporter: Azhar


*Background:*
 Thanks to https://issues.apache.org/jira/browse/GRIFFIN-315, we already have 
JDBC connector.
 However, currently, it is pulling all the columns using`"SELECT * FROM 
$fullTableName"`.
 It will cause some issues for larger JDBC tables -
 - memory overhead for spark data frame
 - longer execution time
 - resource overhear for RDBMS

*Proposed Improvement:*
 So, I propose the feature to allow JDBC connector to able to select only 
required columns.

Example:
 We have a rule `"rule":"src.id = tgt.id and src.country = tgt.country "`. Then 
we only need two columns `id` and 'country'.
 So, in connector we can add additional clause `columns` to select only 
required columns, like below:

 
{code:java}
{   "name":"src",
   "connector":{      "type":"jdbc",
      "config":{         "database":"mydatabase",
         "tablename":"mytable",
         "columns":"id, country",
         "url":"jdbc:sqlserver://myhost:1433;databaseName=mydatabase",
         "user":"user",
         "password":"password",
         "driver":"com.microsoft.sqlserver.jdbc.SQLServerDriver",
         "where":""
      }
   }
}
{code}
We can implement it like this, if there is `columns` clause then use it 
otherwise use `*` as default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to