QuLogic commented on issue #44769:
URL: https://github.com/apache/arrow/issues/44769#issuecomment-2482688809

   From the [Parquest file 
format](https://parquet.apache.org/docs/file-format/), it appears the file 
metadata length should always be little endian. With this patch:
   ```diff
   diff --git a/cpp/src/parquet/file_reader.cc b/cpp/src/parquet/file_reader.cc
   index 3e9eeea6c..7585afcc0 100644
   --- a/cpp/src/parquet/file_reader.cc
   +++ b/cpp/src/parquet/file_reader.cc
   @@ -497,9 +497,10 @@ class SerializedFile : public 
ParquetFileReader::Contents {
              "is not a parquet file.");
        }
        // Both encrypted/unencrypted footers have the same footer length check.
   -    uint32_t metadata_len = ::arrow::util::SafeLoadAs<uint32_t>(
   -        reinterpret_cast<const uint8_t*>(footer_buffer->data()) + 
footer_read_size -
   -        kFooterSize);
   +    uint32_t metadata_len = ::arrow::bit_util::FromLittleEndian(
   +        ::arrow::util::SafeLoadAs<uint32_t>(
   +            reinterpret_cast<const uint8_t*>(footer_buffer->data()) + 
footer_read_size -
   +            kFooterSize));
        if (metadata_len > source_size_ - kFooterSize) {
          throw ParquetInvalidOrCorruptedFileException(
              "Parquet file size is ", source_size_,
   diff --git a/cpp/src/parquet/file_writer.cc b/cpp/src/parquet/file_writer.cc
   index baa9e00da..695347d8c 100644
   --- a/cpp/src/parquet/file_writer.cc
   +++ b/cpp/src/parquet/file_writer.cc
   @@ -539,6 +539,7 @@ void WriteFileMetaData(const FileMetaData& 
file_metadata, ArrowOutputStream* sin
      metadata_len = static_cast<uint32_t>(position) - metadata_len;
    
      // Write Footer
   +  metadata_len = ::arrow::bit_util::ToLittleEndian(metadata_len);
      
PARQUET_THROW_NOT_OK(sink->Write(reinterpret_cast<uint8_t*>(&metadata_len), 4));
      PARQUET_THROW_NOT_OK(sink->Write(kParquetMagic, 4));
    }
   @@ -562,6 +563,7 @@ void WriteEncryptedFileMetadata(const FileMetaData& 
file_metadata,
        PARQUET_ASSIGN_OR_THROW(position, sink->Tell());
        metadata_len = static_cast<uint32_t>(position) - metadata_len;
    
   +    metadata_len = ::arrow::bit_util::ToLittleEndian(metadata_len);
        
PARQUET_THROW_NOT_OK(sink->Write(reinterpret_cast<uint8_t*>(&metadata_len), 4));
        PARQUET_THROW_NOT_OK(sink->Write(kParquetMagic, 4));
      }
   ```
   I can fix most of these occurrences, _except_ for tests for encryption, and 
I don't know where I've missed for that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to