goldmedal commented on issue #12788:
URL: https://github.com/apache/datafusion/issues/12788#issuecomment-2400448053

   @alamb @jayzhan211 
   I drafted a PR #12816 for a simple POC. In this PR, we can use it like
   ```rust
       let ctx = SessionContext::new();
       ctx.sql(
           r#"
       CREATE EXTERNAL TABLE hits
       STORED AS PARQUET
       LOCATION 'benchmarks/data/hits_partitioned'
       OPTIONS ('binary_as_string' 'true')
       "#,
       ).await?.show().await?;
       ctx.sql("describe hits").await?.show().await?;
       ctx.sql(r#"select "Title" from hits limit 1"#).await?.show().await?;
   ```
   The result is
   ```
   +-----------------------+-----------+-------------+
   | column_name           | data_type | is_nullable |
   +-----------------------+-----------+-------------+
   | WatchID               | Int64     | YES         |
   | JavaEnable            | Int16     | YES         |
   | Title                 | Utf8      | YES         |
   | GoodEvent             | Int16     | YES         |
   ...
   
   +--------------------------+
   | arrow_typeof(hits.Title) |
   +--------------------------+
   | Utf8                     |
   +--------------------------+
   ```
   If you want to do some experiments, this PR is easy to use.
   It seems that I need to fix some CI fails, but the basic function works 
fine, I guess.
   I'll add some tests and documents soon.
   
   ## Related Issue
   By the way, I found an issue about casting `Binary` to `StringView` when I 
tired to use this `binary_as_string` with `schema_force_view_types`.
   ```rust
       let ctx = SessionContext::new();
       ctx.sql(
           r#"
       CREATE EXTERNAL TABLE hits
       STORED AS PARQUET
       LOCATION '/Users/jax/git/datafusion/benchmarks/data/hits_partitioned'
       OPTIONS ('binary_as_string' 'true', 'schema_force_view_types' 'true')
       "#,
       ).await?.show().await?;
       ctx.sql("describe hits").await?.show().await?;
       ctx.sql(r#"select "Title" from hits limit 1"#).await?.show().await?;
   ----
   Error: Error during planning: Cannot cast file schema field Title of type 
Binary to table schema field of type Utf8View
   ```
   
   It can be reproduced by
   ```
   > select arrow_cast(arrow_cast('abc', 'Binary'), 'Utf8View');
   This feature is not implemented: Unsupported CAST from Binary to Utf8View
   > select arrow_cast(arrow_cast('abc', 'Binary'), 'Utf8');
   +-----------------------------------------------------------------+
   | arrow_cast(arrow_cast(Utf8("abc"),Utf8("Binary")),Utf8("Utf8")) |
   +-----------------------------------------------------------------+
   | abc                                                             |
   +-----------------------------------------------------------------+
   1 row(s) fetched. 
   Elapsed 0.071 seconds.
   ```
   
   I guess it is an issue of `arrow-cast` at 
   
https://github.com/apache/arrow-rs/blob/1be268db2237b8850161f96849353eac00cb2615/arrow-cast/src/cast/mod.rs#L209-L210
   
   Maybe we should file an issue on the arrow-rs repo. 🤔 
   
   `BinaryView` works well.
   ```
   > select arrow_cast(arrow_cast('abc', 'BinaryView'), 'Utf8');
   +---------------------------------------------------------------------+
   | arrow_cast(arrow_cast(Utf8("abc"),Utf8("BinaryView")),Utf8("Utf8")) |
   +---------------------------------------------------------------------+
   | abc                                                                 |
   +---------------------------------------------------------------------+
   1 row(s) fetched. 
   Elapsed 0.034 seconds.
   
   > select arrow_cast(arrow_cast('abc', 'BinaryView'), 'Utf8View');
   +-------------------------------------------------------------------------+
   | arrow_cast(arrow_cast(Utf8("abc"),Utf8("BinaryView")),Utf8("Utf8View")) |
   +-------------------------------------------------------------------------+
   | abc                                                                     |
   +-------------------------------------------------------------------------+
   1 row(s) fetched. 
   Elapsed 0.007 seconds.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to