Re: [PR] Implement Hive-Style Partitioned Write Support [arrow-datafusion]

via GitHub Fri, 20 Oct 2023 10:42:45 -0700


suremarc commented on PR #7801:
URL: 
https://github.com/apache/arrow-datafusion/pull/7801#issuecomment-1773144039


   @devinjdangelo I attempted to use this feature in `datafusion-cli` today, as 
it is useful for something I am doing. I got this error when writing to a 
partitioned table:
   
   ```
   This feature is not implemented: it is not yet supported to write to hive 
partitions with datatype Dictionary(UInt16, Utf8)
   ```
   
   Here is a repro using `datafusion-cli`:
   
   ```sql
   CREATE EXTERNAL TABLE lz4_raw_compressed_larger
   STORED AS PARQUET
   PARTITIONED BY (partition)
   LOCATION 'data/';
   
   INSERT INTO lz4_raw_compressed_larger VALUES ('non-partition-value', 
'partition');
   ```
   
   Here's a [zip 
file](https://github.com/apache/arrow-datafusion/files/13057020/lz4_raw_compressed_larger.zip)
 with a single file in it, 
`data/partition=A/lz4_raw_compressed_larger.parquet`. 
   
   I noticed the unit tests specify the schema explicitly, but I am guessing if 
you have DataFusion infer the schema, the partition columns are encoded as 
dictionaries. I think this will limit the usefulness of this feature if 
partitioned writes don't work with tables whose schemas are inferred. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Implement Hive-Style Partitioned Write Support [arrow-datafusion]

Reply via email to