alamb commented on code in PR #6117:
URL: https://github.com/apache/arrow-rs/pull/6117#discussion_r1692981773


##########
parquet/src/file/metadata/mod.rs:
##########
@@ -887,6 +887,7 @@ impl ColumnChunkMetaDataBuilder {
     }
 
     /// Sets file offset in bytes.

Review Comment:
   ```suggestion
       /// Sets file offset in bytes.
       ///
       /// This field was meant to provide an alternate to storing 
`ColumnMetadata` directly in
       /// the `ColumnChunkMetadata`. However, most parquet readers assume the 
`ColumnMetadata`
       /// is stored inline and ignore this field. 
   ```



##########
parquet/src/column/writer/mod.rs:
##########
@@ -1023,8 +1017,6 @@ impl<'a, E: ColumnValueEncoder> GenericColumnWriter<'a, 
E> {
         }
 
         let metadata = builder.build()?;
-        self.page_writer.write_metadata(&metadata)?;

Review Comment:
   This is the major functional change I think -- I expect it would result in 
slightly smaller parquet files as the per column metadata is no longer written 
twice



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to