alamb opened a new issue, #7328: URL: https://github.com/apache/arrow-datafusion/issues/7328
### Is your feature request related to a problem or challenge? Some people want to use DataFusion as a read only engine (for example we do in IOx). We do not want to allow users to: 1. Create memory backed tables (the state is ephemeral, so they won't be able to use them) 2. Write to local files (via COPY) as this is a security issue 3. Set session configuration (e.g. batch_size) as this can cause unwanted memory use / Denial of service attacks Other users, such as datafusion-cli want to allow all the features Also, DataFusion has gained additional capabilities, such as the ability to INSERT into the included table providers like `Csv` and `Json`, it may not be obvious to builders on top of DataFusion that such modifications are allowed and depending on their usecase may actually be a security risk While working on https://github.com/apache/arrow-datafusion/issues/7272 from @UlfarErl , it is pretty clear that the distinction between APIs that handle read only sql and SQL that modifies the catalog is confusing. Additionally the new `COPY` command, is a normal execution plan, and thus without additional work on IOx (see https://github.com/influxdata/influxdb_iox/pull/8515#discussion_r1297654343 ) datafusion could allow users to run COPY (and overwrite local files, etc) ### Describe the solution you'd like Thus I propose making an API on SessionContext and SessionState with the specific options about what types of operations are supported: Something like: ```rust struct SQLOptions { /// allow DDL catalog modification commands (e.g. `CREATE TABLE ...`) allow_ddl: bool, /// allow DML data modification commands (e.g. `INSERT and COPY`) allow_dml: bool, /// allow configuration changes (e.g. `SET ...`) allow_config: bool } ``` And then add this: ```rust impl SessionContext { /// Existing API will allow all types of SQL: pub async fn sql(&self, sql: &str) -> Result<DataFrame> {. self.sql_with_options(sql SQLOptions { allow_ddl: true, allow_dml: true, allow_config: true, }) } /// New API will generate errors if a type of command is not allowed pub async fn sql_with_options(&self, sql: &str, options: SQLOptions) -> Result<DataFrame> { let plan = ...; if is_dml(plan) && !optiobs.allow_dml { return plan_err!("DML Plan {plan} is not allowed") } ... } ``` ### Describe alternatives you've considered _No response_ ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
