evenyag opened a new issue, #5296:
URL: https://github.com/apache/arrow-rs/issues/5296

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   <!--
   A clear and concise description of what the problem is. Ex. I'm always 
frustrated when [...] 
   (This section helps Arrow developers understand the context and *why* for 
this feature, in addition to  the *what*)
   -->
   
   The `ArrowWriter` always adds the metadata `ARROW_SCHEMA_META_KEY` to its 
generated parquet file.
   
https://github.com/apache/arrow-rs/blob/72d8a783176219f0864022daba70e84ceab7e221/parquet/src/arrow/arrow_writer/mod.rs#L118-L126
   
   Sometimes I'd like to write a parquet file without additional metadata or I 
have another metadata to describe the schema. It'd be helpful to provide a way 
to disable the embed metadata.
   
   **Describe the solution you'd like**
   <!--
   A clear and concise description of what you want to happen.
   -->
   
   I found that the CPP implementation disables the metadata by default and 
provides a `store_schema` option to enable it.
   - https://github.com/apache/arrow/blob/main/docs/source/cpp/parquet.rst#id61
   - 
https://arrow.apache.org/docs/dev/cpp/api/formats.html#_CPPv4N7parquet21ArrowWriterProperties7Builder12store_schemaEv
   
   For backward compatibility, we could add an `ArrowWriterOptions` and enable 
arrow metadata by default. The `ArrowWriter` has a new API to construct itself 
with the options.
   ```rust
   struct ArrowWriterOptions {}
   
   impl ArrowWriterOptions {
       pub fn new() -> Self {}
   
       pub fn with_skip_arrow_metadata(self, skip_arrow_metadata: bool) -> Self 
{}
   }
   
   impl<W> ArrowWriter<W> {
       pub fn try_new_with_options(
           writer: W,
           arrow_schema: SchemaRef,
           props: Option<WriterProperties>,
           options: ArrowReaderOptions,
       ) -> Result<Self> {}
   }
   ```
   
   If `skip_arrow_metadata` is true, the writer won't store the arrow schema 
meta.
   
   **Describe alternatives you've considered**
   <!--
   A clear and concise description of any alternative solutions or features 
you've considered.
   -->
   
   We could also use the same option `store_schema` as arrow-cpp and set it as 
true by default.
   
   **Additional context**
   <!--
   Add any other context or screenshots about the feature request here.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to