Adam Hooper created ARROW-7435:
----------------------------------
Summary: Security issue: ValidateOffsets() does not prevent buffer
over-read
Key: ARROW-7435
URL: https://issues.apache.org/jira/browse/ARROW-7435
Project: Apache Arrow
Issue Type: Bug
Components: C++, Python
Affects Versions: 0.15.1, 1.0.0
Environment: Docker
Reporter: Adam Hooper
Skimming through {{Validate()}} code in both 0.15 and master, I noticed an
oversight in {{BinaryArray}} validation in C++ (and Python).
{{ValidateOffsets()}} checks that the first offset is 0, but it doesn't check
that the offsets all point within the data buffer. A nefarious Arrow file could
write {{offsets=[0,999999]}} and {{data=[]}}. If a caller reads the first value
in that array, that will produce a buffer over-read.
Validation is cheap, since Arrow already validates that offsets are
monotonically increasing. One need only test that the last offset is less than
or equal to the size of the data buffer.
We at Workbench are letting untrusted programs write Arrow files that we then
validate and read. We're keen to ensure Arrow files don't allow untrusted
programs to plant data that leads to arbitrary code execution or arbitrary
reads. We wrote a validation tool that checks this buffer over-read I describe
here:
https://github.com/CJWorkbench/arrow-tools/blob/005fe582b428c1ab6a9ed5f6dc968387d77e9a80/src/arrow-validate.cc#L27.
But it feels to me like Arrow's {{Validate()}} should be checking this.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)