[
https://issues.apache.org/jira/browse/ARROW-1589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16179053#comment-16179053
]
Wes McKinney commented on ARROW-1589:
-------------------------------------
> So the current way method should rather be prefixed w/
> "trusted"/"unsafe"/"fast".
This seems a bit like overkill to me -- if this were the norm for function
naming we would see these naming conventions in Avro, Thrift, Protocol Buffers,
Flatbuffers, and any other protocol / file format library. I think we can
improve things in the short term by making the untrustedness explicit in the
doxygen documentation / code comments. For example, there is no note of
trustedness in
http://arrow.apache.org/docs/cpp/classarrow_1_1ipc_1_1_record_batch_stream_reader.html
That is easy to change.
> A tiny example that already segfaults is the creation and read-out of an
> empty stream, which IMHO should not happen.
I agree; this should not be difficult to test for. The distinction I had hoped
to draw was between failures arising through normal use of the software (bugs
caused by Arrow developers implementing something incorrectly) and failures
caused by bugs in third party systems (e.g. passing an empty string or buffer
to a function). I agree that we should test and fix the most obvious causes of
segfaults that may affect users of these functions.
Please understand that this software we are discussing is primarily the work of
a single volunteer developer (me). The fact that there are not more tests for
the cases you're describing is definitely not due to a failure on my part to
think outside the box -- if you look at my GitHub history you can see that I am
operating at maximum output capacity 100% of the time. As a result of not
having more development help, I have had to make tradeooffs: prioritizing more
features / usability / integration with other projects over testing vs.
concerning myself with more esoteric matters.
> [C++] Fuzzing for certain input formats
> ---------------------------------------
>
> Key: ARROW-1589
> URL: https://issues.apache.org/jira/browse/ARROW-1589
> Project: Apache Arrow
> Issue Type: Test
> Reporter: Marco Neumann
> Assignee: Marco Neumann
>
> The arrow lib should have fuzzing tests for certain input formats, e.g. for
> reading record batches from streams. Ideally, malformed input must not crash
> the system but must report a proper error. This could easily be implemented
> e.g. w/ [libfuzzer|https://llvm.org/docs/LibFuzzer.html] in combination with
> address sanitizer (that's already implemented by Arrow's build system).
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)