[ 
https://issues.apache.org/jira/browse/SPARK-21519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luca Canali updated SPARK-21519:
--------------------------------
    Description: 
This proposes an option to the JDBC datasource, tentatively called 
"sessionInitStatement" to implement the functionality of session initialization 
present for example in the Sqoop connector for Oracle (see 
https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_oraoop_oracle_session_initialization_statements)
 . After each database session is opened to the remote DB, and before starting 
to read data, this option executes a custom SQL statement (or a PL/SQL block in 
the case of Oracle).
Example of usage, relevant to Oracle JDBC:

{code}
val preambleSQL="""
begin 
  execute immediate 'alter session set tracefile_identifier=sparkora'; 
  execute immediate 'alter session set "_serial_direct_read"=true';
  execute immediate 'alter session set time_zone=''+02:00''';
end;
"""

bin/spark-shell --jars ojdb6.jar

val df = spark.read
           .format("jdbc")
           .option("url", "jdbc:oracle:thin:@ORACLEDBSERVER:1521/service_name")
           .option("driver", "oracle.jdbc.driver.OracleDriver")
           .option("dbtable", "(select 1, sysdate, systimestamp, 
current_timestamp, localtimestamp from dual)")
           .option("user", "MYUSER")
           .option("password", "MYPASSWORD").option("fetchsize",1000)
           .option("sessionInitStatement", preambleSQL)
           .load()

df.show(5,false)
{code}

*Comments:* This proposal has been developed and tested for connecting the 
Spark JDBC data source to Oracle databases, however I believe it can be useful 
for other target DBs too, as it is quite generic.   
The code executed by the option "sessionInitStatement" is just the 
user-provided string fed through the execute method of the JDBC connection, so 
it can use the features of the target database language/syntax. When using 
sessionInitStatement for querying Oracle, for example, the user-provided 
command can be a SQL statement or a PL/SQL block grouping multiple commands and 
logic.   
Note the proposed code allows to inject SQL into the target database. This is 
not a security concern as such, as it requires password authentication, however 
beware of the possibilities of injecting user-provided SQL (and PL/SQL) that 
this opens.  

  was:
This proposes an option to the JDBC datasource, tentatively called 
"sessionInitStatement" to implement the functionality of session initialization 
present for example in the Sqoop connector for Oracle (see 
https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_oraoop_oracle_session_initialization_statements)
 . After each database session is opened to the remote DB, and before starting 
to read data, this option executes a custom SQL statement (or a PL/SQL block in 
the case of Oracle).
Example of usage, relevant to Oracle JDBC:

{code}
val preambleSQL="""
begin 
  execute immediate 'alter session set tracefile_identifier=sparkora'; 
  execute immediate 'alter session set "_serial_direct_read"=true';
  execute immediate 'alter session set time_zone=''+02:00''';
end;
"""

bin/spark-shell --jars ojdb6.jar

val df = spark.read
               .format("jdbc")
               .option("url", 
"jdbc:oracle:thin:@ORACLEDBSERVER:1521/service_name")
               .option("driver", "oracle.jdbc.driver.OracleDriver")
               .option("dbtable", "(select 1, sysdate, systimestamp, 
current_timestamp, localtimestamp from dual)")
               .option("user", "MYUSER")
               .option("password", "MYPASSWORD").option("fetchsize",1000)
               .option("sessionInitStatement", preambleSQL)
               .load()

df.show(5,false)
{code}

*Comments:* This proposal has been developed and tested for connecting the 
Spark JDBC data source to Oracle databases, however I believe it can be useful 
for other target DBs too, as it is quite generic.   
The code executed by the option "sessionInitStatement" is just the 
user-provided string fed through the execute method of the JDBC connection, so 
it can use the features of the target database language/syntax. When using 
sessionInitStatement for querying Oracle, for example, the user-provided 
command can be a SQL statement or a PL/SQL block grouping multiple commands and 
logic.   
Note the proposed code allows to inject SQL into the target database. This is 
not a security concern as such, as it requires password authentication, however 
beware of the possibilities of injecting user-provided SQL (and PL/SQL) that 
this opens.  


> Add an option to the JDBC data source to initialize the environment of the 
> remote database session
> --------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-21519
>                 URL: https://issues.apache.org/jira/browse/SPARK-21519
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 2.1.0, 2.1.1, 2.2.0
>            Reporter: Luca Canali
>            Assignee: Luca Canali
>            Priority: Minor
>             Fix For: 2.3.0
>
>
> This proposes an option to the JDBC datasource, tentatively called 
> "sessionInitStatement" to implement the functionality of session 
> initialization present for example in the Sqoop connector for Oracle (see 
> https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_oraoop_oracle_session_initialization_statements)
>  . After each database session is opened to the remote DB, and before 
> starting to read data, this option executes a custom SQL statement (or a 
> PL/SQL block in the case of Oracle).
> Example of usage, relevant to Oracle JDBC:
> {code}
> val preambleSQL="""
> begin 
>   execute immediate 'alter session set tracefile_identifier=sparkora'; 
>   execute immediate 'alter session set "_serial_direct_read"=true';
>   execute immediate 'alter session set time_zone=''+02:00''';
> end;
> """
> bin/spark-shell --jars ojdb6.jar
> val df = spark.read
>            .format("jdbc")
>            .option("url", 
> "jdbc:oracle:thin:@ORACLEDBSERVER:1521/service_name")
>            .option("driver", "oracle.jdbc.driver.OracleDriver")
>            .option("dbtable", "(select 1, sysdate, systimestamp, 
> current_timestamp, localtimestamp from dual)")
>            .option("user", "MYUSER")
>            .option("password", "MYPASSWORD").option("fetchsize",1000)
>            .option("sessionInitStatement", preambleSQL)
>            .load()
> df.show(5,false)
> {code}
> *Comments:* This proposal has been developed and tested for connecting the 
> Spark JDBC data source to Oracle databases, however I believe it can be 
> useful for other target DBs too, as it is quite generic.   
> The code executed by the option "sessionInitStatement" is just the 
> user-provided string fed through the execute method of the JDBC connection, 
> so it can use the features of the target database language/syntax. When using 
> sessionInitStatement for querying Oracle, for example, the user-provided 
> command can be a SQL statement or a PL/SQL block grouping multiple commands 
> and logic.   
> Note the proposed code allows to inject SQL into the target database. This is 
> not a security concern as such, as it requires password authentication, 
> however beware of the possibilities of injecting user-provided SQL (and 
> PL/SQL) that this opens.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to