[
https://issues.apache.org/jira/browse/GRIFFIN-332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17173364#comment-17173364
]
Obaidul Karim commented on GRIFFIN-332:
---------------------------------------
Hi [~ishanverma] ,
Yes, we are using the latest code (which is 0.6.0) available on git repo. Not
sure how much I can help you, but me explain how we are using this in our POC.
We are using connector "type":"jdbc". With jdbc connector type, connecting to
MySQL doesn't require importing external MySQL driver. That means if using
0.6.0, you can connect to MySQL just with connector config like below.
(Please take note that in 0.6.0, it is "connector" NOT "connector*s*" and
"connector" is no more an array)
"data.sources":[
\{
"name":"src",
"connector":{
"type":"jdbc",
"config":{
"database":"db_name",
"tablename":"tbl_name",
"url":"jdbc:mysql://host:3306/db_name",
"user":"user",
"password":"password",
"driver":"com.mysql.jdbc.Driver",
"where":"id>0"
}
}
}
]
However, we also skipped using UI. All our configurations are based on files.
And we are storing the output to Elasticsearch and S3(copy HDFS to S3), for
post-processing and present using generic dashboarding tools like
kibana/tableau.
And we tested griffin 0.6.0 on AWS emr-5.30.1.
Hope my explanation will give you a lead :)
-Obaid
> JDBC Connector: Ability to Select Specific Columns Instead of All the Columns
> -----------------------------------------------------------------------------
>
> Key: GRIFFIN-332
> URL: https://issues.apache.org/jira/browse/GRIFFIN-332
> Project: Griffin
> Issue Type: Improvement
> Components: accuracy-batch
> Affects Versions: 0.6.0
> Reporter: Obaidul Karim
> Priority: Major
> Labels: columns, jdbc
>
> *Background:*
> Thanks to https://issues.apache.org/jira/browse/GRIFFIN-315, we already have
> JDBC connector.
> However, currently, it is pulling all the columns using`"SELECT * FROM
> $fullTableName"`.
> It will cause some issues for larger JDBC tables -
> - memory overhead for spark data frame
> - longer execution time
> - resource overhear for RDBMS
> *Proposed Improvement:*
> So, I propose the feature to allow JDBC connector to able to select only
> required columns.
> *Example:*
> We have a rule `"rule":"src.id = tgt.id and src.country = tgt.country "`.
> Then we only need two columns `id` and 'country'.
> So, in connector we can add additional clause `columns` to select only
> required columns, like below:
>
> {code:java}
> { "name":"src",
> "connector":{ "type":"jdbc",
> "config":{ "database":"mydatabase",
> "tablename":"mytable",
> "columns":"id, country",
> "url":"jdbc:sqlserver://myhost:1433;databaseName=mydatabase",
> "user":"user",
> "password":"password",
> "driver":"com.microsoft.sqlserver.jdbc.SQLServerDriver",
> "where":""
> }
> }
> }
> {code}
> We can implement it like this, if there is `columns` clause then use it
> otherwise use `*` as default.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)