Igosuki opened a new issue #1923:
URL: https://github.com/apache/arrow-datafusion/issues/1923


   **Describe the bug**
   One can register a table with the file scheme `file://`, this in turns 
allows listing table to list files and find partitions.
   Unfortunately, LocalStore returns a FileMetaStream where the SizedFile path 
has the prefix stripped. This could be fine except 
`datafusion::datasource::listing::helpers::parse_partitions_for_path``` calls 
strip_prefix on the file_path with the original path used to register the 
table, which contains the scheme.
   
   There are two ways to fix this, either strip the scheme off the path in the 
registered table as well (would probably be best to let the ObjectStore 
implementation do that), or enhance FileMeta and use a URI instead of just a 
path.
   
   **To Reproduce**
   Steps to reproduce the behavior:
   
   ```/tmp/listing_table/part1=value1/``` and 
```/tmp/listing_table/part1=value2/```
   should contain one parquet file each
   
   ```
   let mut ctx = ExecutionContext::new();
           let listing_options = ListingOptions {
               file_extension: "parquet".to_string(),
               format: Arc::new(ParquetFormat::default()),
               table_partition_cols: vec!["part1"],
               collect_stat: true,
               target_partitions: 8,
           };
           ctx.register_listing_table(
               "my_table",
               "file:///tmp/listing_table",
               listing_options,
               None,
           )
           .await?;
   
          let df = ctx.sql("select count(*) from my_table").await?;
          let rb = df.collect().await?;
          eprintln!("rb = {:?}", rb);
   ```
   
   **Expected behavior**
   The above should count the lines in the files properly, with the current 
behavior it'll return 0.
   
   **Additional context**
   I'm trying to be consistent on my project and so I use schemes for both 
local and remote files. Finding this debug required a lot of debugging.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to