etseidl commented on PR #8080: URL: https://github.com/apache/arrow-rs/pull/8080#issuecomment-3190064509
Ok, I'm starting to grok this. I merged this branch into my current thrift branch, and changed `try_decode` to ```rust pub fn try_decode( &mut self, ) -> std::result::Result<DecodeResult<ParquetMetaData>, ParquetError> { if self.done { return Ok(DecodeResult::Finished); } // need to have the last 8 bytes of the file to decode the metadata let file_len = self.buffers.file_len(); if !self.buffers.has_range(&(file_len - 8..file_len)) { #[expect(clippy::single_range_in_vec_init)] return Ok(DecodeResult::NeedsData(vec![file_len - 8..file_len])); } // Try to parse the metadata from the buffers we have. // If we don't have enough data, it will return a `ParquetError::NeedMoreData` // with the number of bytes needed to complete the metadata parsing. // If we have enough data, it will return `Ok(())` and we can let footer_bytes = self .buffers .get_bytes(file_len - FOOTER_SIZE as u64, FOOTER_SIZE)?; let mut footer = [0_u8; FOOTER_SIZE]; footer_bytes.as_ref().copy_to_slice(&mut footer); let footer = ParquetMetaDataReader::decode_footer_tail(&footer)?; let metadata_len = footer.metadata_length(); let footer_metadata_len = FOOTER_SIZE + metadata_len; let footer_start = file_len - footer_metadata_len as u64; let footer_end = file_len - FOOTER_SIZE as u64; if !self.buffers.has_range(&(footer_start..footer_end)) { #[expect(clippy::single_range_in_vec_init)] return Ok(DecodeResult::NeedsData(vec![footer_start..file_len])); } let metadata_bytes = self.buffers.get_bytes(footer_start, metadata_len)?; let metadata = ParquetMetaDataReader::decode_file_metadata(&metadata_bytes)?; self.done = true; Ok(DecodeResult::Data(metadata)) } ``` No page indexes yet, but this seems pretty nice 👍 Once I have the page indexes converted the parser should get pretty simple. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org