alamb opened a new issue, #9269:
URL: https://github.com/apache/arrow-datafusion/issues/9269

   ### Describe the bug
   
   There is a bug when reading from partitioned tables that have commas in 
their names
   
   Here is the test
   
https://github.com/apache/arrow-datafusion/blob/b2a04519da97c2ff81789ef41dd652870794a73a/datafusion/sqllogictest/test_files/copy.slt#L109
   
   ### To Reproduce
   
   Run this script
   
   ```sql
   -- create a table with quotes in the column names
   create table test ("'test'" varchar, "'test2'" varchar, "'test3'" varchar);
   insert into test VALUES ('a', 'x', 'aa'), ('b','y', 'bb'), ('c', 'z', 'cc');
   copy test to '/tmp/escape_quote' (format csv, partition_by 
'''test2'',''test3''');
   
   -- read back from the table
   CREATE EXTERNAL TABLE validate_partitioned_escape_quote STORED AS CSV
   LOCATION '/tmp/escape_quote/' PARTITIONED BY ("'test2'", "'test3'");
   
   -- This panics
   select * from validate_partitioned_escape_quote;
   ```
   
   Here is an example:
   
   ```sql
   ❯ -- create a table with quotes in the column names
   create table test ("'test'" varchar, "'test2'" varchar, "'test3'" varchar);
   insert into test VALUES ('a', 'x', 'aa'), ('b','y', 'bb'), ('c', 'z', 'cc');
   copy test to '/tmp/escape_quote' (format csv, partition_by 
'''test2'',''test3''');
   
   0 rows in set. Query took 0.008 seconds.
   
   +-------+
   | count |
   +-------+
   | 3     |
   +-------+
   1 row in set. Query took 0.009 seconds.
   
   +-------+
   | count |
   +-------+
   | 3     |
   +-------+
   1 row in set. Query took 0.029 seconds.
   
   ❯ -- read back from the table
   CREATE EXTERNAL TABLE validate_partitioned_escape_quote STORED AS CSV
   LOCATION '/tmp/escape_quote/' PARTITIONED BY ("'test2'", "'test3'");
   
   0 rows in set. Query took 0.004 seconds.
   
   ❯ -- This panics
   select * from validate_partitioned_escape_quote;
   
   thread 'thread 'tokio-runtime-workertokio-runtime-worker' panicked at ' 
panicked at 
/Users/andrewlamb/Software/arrow-datafusion/datafusion/core/src/datasource/physical_plan/file_scan_config.rs/Users/andrewlamb/Software/arrow-datafusion/datafusion/core/src/datasource/physical_plan/file_scan_config.rs::248:thread
 '54248:
   :tokio-runtime-workerindex out of bounds: the len is 0 but the index is 054' 
panicked at
   
/Users/andrewlamb/Software/arrow-datafusion/datafusion/core/src/datasource/physical_plan/file_scan_config.rs:248:
   :index out of bounds: the len is 0 but the index is 054
   :
   index out of bounds: the len is 0 but the index is 0
   stack backtrace:
      0: rust_begin_unwind
                at 
/rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/std/src/panicking.rs:645:5
      1: core::panicking::panic_fmt
                at 
/rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/panicking.rs:72:14
      2: core::panicking::panic_bounds_check
                at 
/rustc/07dca489ac2d933c78d3c5158e3f43beefeb02ce/library/core/src/panicking.rs:208:5
      3: 
datafusion::datasource::physical_plan::file_scan_config::PartitionColumnProjector::project
      4: <datafusion::datasource::physical_plan::file_stream::FileStream<F> as 
futures_core::stream::Stream>::poll_next
      5: 
datafusion_physical_plan::stream::RecordBatchReceiverStreamBuilder::run_input::{{closure}}
      6: tokio::runtime::task::core::Core<T,S>::poll
      7: tokio::runtime::task::harness::Harness<T,S>::poll
      8: tokio::runtime::scheduler::multi_thread::worker::Context::run_task
      9: tokio::runtime::scheduler::multi_thread::worker::Context::run
     10: tokio::runtime::context::runtime::enter_runtime
     11: tokio::runtime::scheduler::multi_thread::worker::run
     12: <tokio::runtime::blocking::task::BlockingTask<T> as 
core::future::future::Future>::poll
     13: tokio::runtime::task::core::Core<T,S>::poll
     14: tokio::runtime::task::harness::Harness<T,S>::poll
     15: tokio::runtime::blocking::pool::Inner::run
   note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose 
backtrace.
   ```
   
   ### Expected behavior
   
   Note the data is written correctly
   
   ```shell
   andrewlamb@Andrews-MacBook-Pro:~/Software/influxdb_iox$ find 
/tmp/escape_quote
   /tmp/escape_quote
   /tmp/escape_quote/'test2'=x
   /tmp/escape_quote/'test2'=x/'test3'=aa
   /tmp/escape_quote/'test2'=x/'test3'=aa/3zMw255TXFQxId14.csv
   /tmp/escape_quote/'test2'=y
   /tmp/escape_quote/'test2'=y/'test3'=bb
   /tmp/escape_quote/'test2'=y/'test3'=bb/3zMw255TXFQxId14.csv
   /tmp/escape_quote/'test2'=z
   /tmp/escape_quote/'test2'=z/'test3'=cc
   /tmp/escape_quote/'test2'=z/'test3'=cc/3zMw255TXFQxId14.csv
   ```
   
   
   ```
   andrewlamb@Andrews-MacBook-Pro:~/Software/influxdb_iox$ cat 
/tmp/escape_quote/\'test2\'\=x/\'test3\'\=aa/3zMw255TXFQxId14.csv
   'test'
   a
   ```
   
   ### Additional context
   
   @devinjdangelo  found this in 
https://github.com/apache/arrow-datafusion/pull/9240


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to