mingnuj opened a new issue, #6040:
URL: https://github.com/apache/arrow-datafusion/issues/6040

   ### Describe the bug
   
   While using multiple conditions are used, a stack overflow error occurs.
   
   In particular, when used with tokio, more limitations arise because the 
[default stack size is 
2MiB](https://asomers.github.io/tokio-file/tokio/runtime/struct.Builder.html#method.thread_stack_size).
   
   ### To Reproduce
   
   I referred to reproduce code from issue #1434 provided by @mcassels.
   `SELECT * FROM table WHERE <condition0> OR <condition1> OR ...` 
   ``` rust
   use datafusion::{
       arrow::datatypes::{DataType, Field, Schema},
       common::Result,
       config::ConfigOptions,
       error::DataFusionError,
       logical_expr::{
           logical_plan::builder::LogicalTableSource, AggregateUDF, ScalarUDF, 
TableSource,
       },
       sql::{
           planner::{ContextProvider, SqlToRel},
           sqlparser::{dialect::GenericDialect, parser::Parser},
           TableReference,
       },
   };
   use std::{collections::HashMap, sync::Arc};
   
   #[global_allocator]
   static GLOBAL: mimalloc::MiMalloc = mimalloc::MiMalloc;
   
   #[tokio::main]
   async fn main() -> Result<()> {
       let num_conditions = 255;
       let where_clause = (0..num_conditions)
           .map(|i| format!("column1 = 'value{:?}'", i))
           .collect::<Vec<String>>()
           .join(" OR ");
       let sql = format!("SELECT * from table1 where {};", where_clause);
       get_optimized_plan(sql).await?;
       println!("query succeeded with {:?} conditions", num_conditions);
   
       let num_conditions = 256;
       let where_clause = (0..num_conditions)
           .map(|i| format!("column1 = 'value{:?}'", i))
           .collect::<Vec<String>>()
           .join(" OR ");
       let sql = format!("SELECT * from table1 where {};", where_clause);
       get_optimized_plan(sql).await?;
       println!("query succeeded with {:?} conditions", num_conditions);
   
       Ok(())
   }
   
   async fn get_optimized_plan(sql: String) -> Result<()> {
       let schema_provider = TestSchemaProvider::new();
   
       let dialect = GenericDialect {};
       let ast = Parser::parse_sql(&dialect, &sql).unwrap();
       let statement = &ast[0];
       let sql_to_rel = SqlToRel::new(&schema_provider);
       sql_to_rel.sql_statement_to_plan(statement.clone()).unwrap();
   
       Ok(())
   }
   
   struct TestSchemaProvider {
       options: ConfigOptions,
       tables: HashMap<String, Arc<dyn TableSource>>,
   }
   
   impl TestSchemaProvider {
       pub fn new() -> Self {
           let mut tables = HashMap::new();
           tables.insert(
               "table1".to_string(),
               create_table_source(vec![Field::new(
                   "column".to_string(),
                   DataType::Utf8,
                   false,
               )]),
           );
   
           Self {
               options: Default::default(),
               tables,
           }
       }
   }
   
   fn create_table_source(fields: Vec<Field>) -> Arc<dyn TableSource> {
       Arc::new(LogicalTableSource::new(Arc::new(
           Schema::new_with_metadata(fields, HashMap::new()),
       )))
   }
   
   impl ContextProvider for TestSchemaProvider {
       fn get_table_provider(&self, name: TableReference) -> Result<Arc<dyn 
TableSource>> {
           match self.tables.get(name.table()) {
               Some(table) => Ok(table.clone()),
               _ => Err(DataFusionError::Plan(format!(
                   "Table not found: {}",
                   name.table()
               ))),
           }
       }
   
       fn get_function_meta(&self, _name: &str) -> Option<Arc<ScalarUDF>> {
           None
       }
   
       fn get_aggregate_meta(&self, _name: &str) -> Option<Arc<AggregateUDF>> {
           None
       }
   
       fn get_variable_type(&self, _variable_names: &[String]) -> 
Option<DataType> {
           None
       }
   
       fn options(&self) -> &ConfigOptions {
           &self.options
       }
   }
   ```
   Output
   ``` bash
   query succeeded with 255 conditions
   
   thread 'main' has overflowed its stack
   fatal runtime error: stack overflow
   ```
   
   If there are more than 256 conditions, stack overflow occurs. This happens 
only `debug mode`, related to 
https://github.com/apache/arrow-datafusion/issues/1434#issuecomment-992758421. 
   
   ### Expected behavior
   
   Work without overflows..
   
   ### Additional context
   
   I guess 2 approaches to this problem.
   
   Approach#1
   Parameters are received as reference or without using box pointers in some 
functions, such as select_to_plan and plan_selection.  This maybe can make 
Stack grow faster.
   
   And I found some stack allocation with enumeration.
   
https://www.reddit.com/r/rust/comments/zbla3j/how_does_enums_work_where_are_they_allocated/
   
   Approach#2
   Using [Address 
Sanitizer](https://doc.rust-lang.org/beta/unstable-book/compiler-flags/sanitizer.html)
 with the above example, error occurred in fmt::Display. But, I'm not sure 
exactly where it happened.
   
   This would be related to rust issue: 
https://github.com/rust-lang/rust/issues/45838 .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to