evenyag opened a new issue, #5296: URL: https://github.com/apache/arrow-rs/issues/5296
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.** <!-- A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] (This section helps Arrow developers understand the context and *why* for this feature, in addition to the *what*) --> The `ArrowWriter` always adds the metadata `ARROW_SCHEMA_META_KEY` to its generated parquet file. https://github.com/apache/arrow-rs/blob/72d8a783176219f0864022daba70e84ceab7e221/parquet/src/arrow/arrow_writer/mod.rs#L118-L126 Sometimes I'd like to write a parquet file without additional metadata or I have another metadata to describe the schema. It'd be helpful to provide a way to disable the embed metadata. **Describe the solution you'd like** <!-- A clear and concise description of what you want to happen. --> I found that the CPP implementation disables the metadata by default and provides a `store_schema` option to enable it. - https://github.com/apache/arrow/blob/main/docs/source/cpp/parquet.rst#id61 - https://arrow.apache.org/docs/dev/cpp/api/formats.html#_CPPv4N7parquet21ArrowWriterProperties7Builder12store_schemaEv For backward compatibility, we could add an `ArrowWriterOptions` and enable arrow metadata by default. The `ArrowWriter` has a new API to construct itself with the options. ```rust struct ArrowWriterOptions {} impl ArrowWriterOptions { pub fn new() -> Self {} pub fn with_skip_arrow_metadata(self, skip_arrow_metadata: bool) -> Self {} } impl<W> ArrowWriter<W> { pub fn try_new_with_options( writer: W, arrow_schema: SchemaRef, props: Option<WriterProperties>, options: ArrowReaderOptions, ) -> Result<Self> {} } ``` If `skip_arrow_metadata` is true, the writer won't store the arrow schema meta. **Describe alternatives you've considered** <!-- A clear and concise description of any alternative solutions or features you've considered. --> We could also use the same option `store_schema` as arrow-cpp and set it as true by default. **Additional context** <!-- Add any other context or screenshots about the feature request here. --> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
