Re: [I] Way to share `SchemaDescriptorPtr` across `ParquetMetadata` objects [arrow-rs]

via GitHub Wed, 03 Jul 2024 10:57:43 -0700


alamb commented on issue #5999:
URL: https://github.com/apache/arrow-rs/issues/5999#issuecomment-2206905050


   > Just curious would this a bit conflict with something like schema 
evolution ( https://iceberg.apache.org/docs/1.5.1/evolution/ ) in iceberg cross 
file? Or it's just reuse the schema when open the same file?
   
   In my mind this feature would work well in systems that support schema 
evolution like Iceberg
   
   For example:
   *  20 files that all share the same schema with 2 columns
   *  30 files that all share a different schema with 3 columns (evolved from 
the first 20 files)
   
   
   
   Without the feature described in this ticket, a query system today would 
need to retain 50 schema objects (20 of the first class and 30 of the second)
   
   With the feature described in this ticket the query system could retain only 
2 schema objects
   
   Depending on the number of files I think this could be substantial memory 
savings
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Way to share `SchemaDescriptorPtr` across `ParquetMetadata` objects [arrow-rs]

Reply via email to