ozankabak commented on PR #10590:
URL: https://github.com/apache/datafusion/pull/10590#issuecomment-2129132050

   @davisp what I meant was looking at how other dialects (and systems using 
those dialects, such as BigQuery) handle column-specific metadata and analyze 
pros and cons of various approaches.
   
   As @metegenez says, we already support column specific options today, as in:
   ```sql
   COPY source_table
   TO 'test_files/scratch/copy/table_with_options/'
   STORED AS PARQUET
   OPTIONS (
   'format.compression::col1' 'zstd(5)',
   'format.compression::col2' snappy,
   'format.bloom_filter_fpp::col2' 0.456,
   'format.encoding::col1' DELTA_BINARY_PACKED,
   'format.dictionary_enabled::col2' true,
   )
   ```
   So the question really is to decide whether we want to add shortcuts, for 
user convenience, that set these options and use the existing machinery 
underneath.
   
   The decision will probably boil down to one of the three choices:
   1. We may decide we don't need to/shouldn't support dialect shortcuts at the 
upstream level. The argument for this is that if we support BigQuery, why not 
some other system X too? Also, if we have such support for this feature, why 
not also for another feature?
   2. We may decide to support all major dialect shortcuts for this purpose. If 
we do this, we probably should have the same attitude for other features as 
well.
   3. Hybrid: We may go with 1, but decide that having a shortcut is useful, so 
design/choose a "DF syntax" for it and only support that.
   
   In 1 and 3, downstream users of DF can add more dialect support according to 
their needs thanks to DF's extensibility. And if DF creates any friction while 
doing that, we obviously should address those inconveniences upstream.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to