mingnuj opened a new issue, #6040: URL: https://github.com/apache/arrow-datafusion/issues/6040
### Describe the bug While using multiple conditions are used, a stack overflow error occurs. In particular, when used with tokio, more limitations arise because the [default stack size is 2MiB](https://asomers.github.io/tokio-file/tokio/runtime/struct.Builder.html#method.thread_stack_size). ### To Reproduce I referred to reproduce code from issue #1434 provided by @mcassels. `SELECT * FROM table WHERE <condition0> OR <condition1> OR ...` ``` rust use datafusion::{ arrow::datatypes::{DataType, Field, Schema}, common::Result, config::ConfigOptions, error::DataFusionError, logical_expr::{ logical_plan::builder::LogicalTableSource, AggregateUDF, ScalarUDF, TableSource, }, sql::{ planner::{ContextProvider, SqlToRel}, sqlparser::{dialect::GenericDialect, parser::Parser}, TableReference, }, }; use std::{collections::HashMap, sync::Arc}; #[global_allocator] static GLOBAL: mimalloc::MiMalloc = mimalloc::MiMalloc; #[tokio::main] async fn main() -> Result<()> { let num_conditions = 255; let where_clause = (0..num_conditions) .map(|i| format!("column1 = 'value{:?}'", i)) .collect::<Vec<String>>() .join(" OR "); let sql = format!("SELECT * from table1 where {};", where_clause); get_optimized_plan(sql).await?; println!("query succeeded with {:?} conditions", num_conditions); let num_conditions = 256; let where_clause = (0..num_conditions) .map(|i| format!("column1 = 'value{:?}'", i)) .collect::<Vec<String>>() .join(" OR "); let sql = format!("SELECT * from table1 where {};", where_clause); get_optimized_plan(sql).await?; println!("query succeeded with {:?} conditions", num_conditions); Ok(()) } async fn get_optimized_plan(sql: String) -> Result<()> { let schema_provider = TestSchemaProvider::new(); let dialect = GenericDialect {}; let ast = Parser::parse_sql(&dialect, &sql).unwrap(); let statement = &ast[0]; let sql_to_rel = SqlToRel::new(&schema_provider); sql_to_rel.sql_statement_to_plan(statement.clone()).unwrap(); Ok(()) } struct TestSchemaProvider { options: ConfigOptions, tables: HashMap<String, Arc<dyn TableSource>>, } impl TestSchemaProvider { pub fn new() -> Self { let mut tables = HashMap::new(); tables.insert( "table1".to_string(), create_table_source(vec![Field::new( "column".to_string(), DataType::Utf8, false, )]), ); Self { options: Default::default(), tables, } } } fn create_table_source(fields: Vec<Field>) -> Arc<dyn TableSource> { Arc::new(LogicalTableSource::new(Arc::new( Schema::new_with_metadata(fields, HashMap::new()), ))) } impl ContextProvider for TestSchemaProvider { fn get_table_provider(&self, name: TableReference) -> Result<Arc<dyn TableSource>> { match self.tables.get(name.table()) { Some(table) => Ok(table.clone()), _ => Err(DataFusionError::Plan(format!( "Table not found: {}", name.table() ))), } } fn get_function_meta(&self, _name: &str) -> Option<Arc<ScalarUDF>> { None } fn get_aggregate_meta(&self, _name: &str) -> Option<Arc<AggregateUDF>> { None } fn get_variable_type(&self, _variable_names: &[String]) -> Option<DataType> { None } fn options(&self) -> &ConfigOptions { &self.options } } ``` Output ``` bash query succeeded with 255 conditions thread 'main' has overflowed its stack fatal runtime error: stack overflow ``` If there are more than 256 conditions, stack overflow occurs. This happens only `debug mode`, related to https://github.com/apache/arrow-datafusion/issues/1434#issuecomment-992758421. ### Expected behavior Work without overflows.. ### Additional context I guess 2 approaches to this problem. Approach#1 Parameters are received as reference or without using box pointers in some functions, such as select_to_plan and plan_selection. This maybe can make Stack grow faster. And I found some stack allocation with enumeration. https://www.reddit.com/r/rust/comments/zbla3j/how_does_enums_work_where_are_they_allocated/ Approach#2 Using [Address Sanitizer](https://doc.rust-lang.org/beta/unstable-book/compiler-flags/sanitizer.html) with the above example, error occurred in fmt::Display. But, I'm not sure exactly where it happened. This would be related to rust issue: https://github.com/rust-lang/rust/issues/45838 . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
