QuLogic commented on issue #44769: URL: https://github.com/apache/arrow/issues/44769#issuecomment-2482688809
From the [Parquest file format](https://parquet.apache.org/docs/file-format/), it appears the file metadata length should always be little endian. With this patch: ```diff diff --git a/cpp/src/parquet/file_reader.cc b/cpp/src/parquet/file_reader.cc index 3e9eeea6c..7585afcc0 100644 --- a/cpp/src/parquet/file_reader.cc +++ b/cpp/src/parquet/file_reader.cc @@ -497,9 +497,10 @@ class SerializedFile : public ParquetFileReader::Contents { "is not a parquet file."); } // Both encrypted/unencrypted footers have the same footer length check. - uint32_t metadata_len = ::arrow::util::SafeLoadAs<uint32_t>( - reinterpret_cast<const uint8_t*>(footer_buffer->data()) + footer_read_size - - kFooterSize); + uint32_t metadata_len = ::arrow::bit_util::FromLittleEndian( + ::arrow::util::SafeLoadAs<uint32_t>( + reinterpret_cast<const uint8_t*>(footer_buffer->data()) + footer_read_size - + kFooterSize)); if (metadata_len > source_size_ - kFooterSize) { throw ParquetInvalidOrCorruptedFileException( "Parquet file size is ", source_size_, diff --git a/cpp/src/parquet/file_writer.cc b/cpp/src/parquet/file_writer.cc index baa9e00da..695347d8c 100644 --- a/cpp/src/parquet/file_writer.cc +++ b/cpp/src/parquet/file_writer.cc @@ -539,6 +539,7 @@ void WriteFileMetaData(const FileMetaData& file_metadata, ArrowOutputStream* sin metadata_len = static_cast<uint32_t>(position) - metadata_len; // Write Footer + metadata_len = ::arrow::bit_util::ToLittleEndian(metadata_len); PARQUET_THROW_NOT_OK(sink->Write(reinterpret_cast<uint8_t*>(&metadata_len), 4)); PARQUET_THROW_NOT_OK(sink->Write(kParquetMagic, 4)); } @@ -562,6 +563,7 @@ void WriteEncryptedFileMetadata(const FileMetaData& file_metadata, PARQUET_ASSIGN_OR_THROW(position, sink->Tell()); metadata_len = static_cast<uint32_t>(position) - metadata_len; + metadata_len = ::arrow::bit_util::ToLittleEndian(metadata_len); PARQUET_THROW_NOT_OK(sink->Write(reinterpret_cast<uint8_t*>(&metadata_len), 4)); PARQUET_THROW_NOT_OK(sink->Write(kParquetMagic, 4)); } ``` I can fix most of these occurrences, _except_ for tests for encryption, and I don't know where I've missed for that. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
