alamb opened a new issue, #7328:
URL: https://github.com/apache/arrow-datafusion/issues/7328

   ### Is your feature request related to a problem or challenge?
   
   Some people want to use DataFusion as a read only engine (for example we do 
in IOx). We do not want to allow users to:
   
   1. Create memory backed tables (the state is ephemeral, so they won't be 
able to use them)
   2. Write to local files (via COPY) as this is a security issue
   3. Set session configuration (e.g. batch_size) as this can cause unwanted 
memory use / Denial of service attacks
   
   Other users, such as datafusion-cli want to allow all the features
   
   Also, DataFusion has gained additional capabilities, such as the ability to 
INSERT into the included table providers like `Csv` and `Json`, it may not be 
obvious to builders on top of DataFusion that such modifications are allowed 
and depending on their usecase may actually be a security risk
   
   
   While working on https://github.com/apache/arrow-datafusion/issues/7272 from 
@UlfarErl , it is pretty clear that the distinction between APIs that handle 
read only sql and SQL that modifies the catalog is confusing. Additionally 
   the new `COPY` command, is a normal execution plan, and thus without 
additional work on IOx (see 
https://github.com/influxdata/influxdb_iox/pull/8515#discussion_r1297654343 ) 
datafusion could allow users to run COPY (and overwrite local files, etc)
   
   
   ### Describe the solution you'd like
   
   
   Thus I propose making an API on SessionContext and SessionState with the 
specific options about what types of operations are supported:
   
   Something like:
   
   ```rust
   struct SQLOptions {
     /// allow DDL catalog modification commands (e.g. `CREATE TABLE ...`)
     allow_ddl: bool,
     /// allow DML data modification commands (e.g. `INSERT and COPY`)
     allow_dml: bool,
   /// allow configuration changes (e.g. `SET ...`)
   allow_config: bool
   }
   ```
   
   And then add this: 
   ```rust
   impl SessionContext {
   
   /// Existing API will allow all types of SQL:
   pub async fn sql(&self, sql: &str) -> Result<DataFrame> {.
     self.sql_with_options(sql SQLOptions {
       allow_ddl: true,
       allow_dml: true,
       allow_config: true,
       })
   }
   
   /// New API will generate errors if a type of command is not allowed
   pub async fn sql_with_options(&self, sql: &str, options: SQLOptions) -> 
Result<DataFrame> {
     let plan = ...;
     if is_dml(plan) && !optiobs.allow_dml {
       return plan_err!("DML Plan {plan} is not allowed")
     }
     ...
   }
   ```
   
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to