I know the hypothesis developers consider Hypothesis to be different from
fuzzing. But I've never been exactly clear just what is meant by "fuzzing"
in the context you are suggesting. When you say you want to "fuzz NumPy"
what sorts of things would the fuzzer be doing? Would you need to tell it
what various NumPy functions and operations are and how to generate inputs
for them? Or does it do that automatically somehow? And how would you tell
it what sorts of things to check for a given set of inputs?

For a Hypothesis test, you would tell it explicitly what the input is, like
"a is an array with some given properties (e.g., >1 dim, has a numerical
dtype, has positive values, etc.)". Then you explicitly write a bunch of
assertions that such arrays should satisfy (like some f(a).all()). It then
generates examples from the given set of inputs in an attempt to falsify
the given assertions. The whole process requires a considerable amount of
human work because you have to figure out a bunch of properties that
various operations should satisfy on certain sets of inputs and write tests
for them. I'm still unclear on just what "fuzzing" is, but my impression
has always been that it's not this.

One difference I do know between hypothesis and a fuzzer is that hypothesis
is more geared toward finding test failures and getting you to fix them. So
for example, Hypothesis only runs 100 examples by default each run. You
have to manually increase that number to run more. Another difference is if
Hypothesis finds a failure, it will fixate on that failure and always
return it, even to the detriment of finding other possible failures, until
you either fix it or modify the strategies to ignore it. My understanding
is that a fuzzer is more geared toward exploring a wide search space and
finding as many possible issues as possible, even if there isn't the
immediate possibility of them becoming fixed.

I've used Hypothesis on several projects that depend on NumPy and
incidentally found several bugs in NumPy with it (for example,
https://github.com/numpy/numpy/issues/15753).

Aaron Meurer

On Wed, Jun 8, 2022 at 8:44 AM david korczynski <da...@adalogics.com> wrote:

> I'm not 100% about the important differences, so this is a bit of an
> intuitive analysis from my side (I know little about Hypothesis and more
> about fuzzing).
>
> Hypothesis has support for traditional fuzzing [sic]:
>
> https://hypothesis.readthedocs.io/en/latest/details.html?highlight=fuzz#use-with-external-fuzzers
> and OSS-Fuzz supports using Python fuzzing by way of Hypothesis
>
> https://google.github.io/oss-fuzz/getting-started/new-project-guide/python-lang/#hypothesis
> although it will be seeded with the Atheris fuzzer and based on this
> issue https://github.com/google/atheris/issues/20 it seems Atheris +
> Hypothesis might not be working particularly well together.
>
> I think based on the above and skimming through the Hypothesis docs that
> there are many similarities between fuzzing (Atheris specifically) but
> the underlying engine that explores the input space is different.
> Fuzzing is coverage-guided (which I don't think Hypothesis is, but I
> could be wrong), meaning the target program is instrumented to identify
> if a newly generated input explores new code. In essence, this makes
> fuzzing a mutational genetic algorithm. Another benefit is OSS-Fuzz will
> build the target code with various sanitizers (ASan, UBSan, MSan) which
> will help highlight issues in the native code.
>
> About the why it would be great to fuzz more Python code, then this was
> more of a general statement in that a lot of effort is being put into
> this from the OSS-Fuzz side because Python is a widely used language.
> For example, an effort in this domain is investigation into new bug
> oracles for Python (like sanitizers but targeted memory safe languages).
>
> On 07/06/2022 15:10, Matti Picus wrote:
> >
> > On 7/6/22 14:02, david korczynski wrote:
> >> Hi Numpy maintainers,
> >>
> >> Would you be interested in integrating continuous fuzzing by way of
> >> OSS-Fuzz? Fuzzing is a way to automate test-case generation and has been
> >> heavily used for memory unsafe languages. Recently efforts have been put
> >> into fuzzing memory safe languages and Python is one of the languages
> >> where it would be great to use fuzzing.
> >>
> >> ...
> >>
> >> Let me know your thoughts on this and if you have any questions as I’m
> >> happy to clarify or go more into details with fuzzing.
> >>
> >> Kind regards,
> >> David
> >
> >
> > Could you compare and contrast this to hypothesis [0], which we are
> > already using in our testing?
> >
> > I don't understand what you mean by "Python is one of the languages
> > where it would be great to use fuzzing". Why?
> >
> > Matti
> >
> >
> > [0] https://hypothesis.readthedocs.io/en/latest/index.html
> >
> > _______________________________________________
> > NumPy-Discussion mailing list -- numpy-discussion@python.org
> > To unsubscribe send an email to numpy-discussion-le...@python.org
> > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> > Member address: da...@adalogics.com
> > ADA Logics Ltd is registered in England. No: 11624074.
> > Registered office: 266 Banbury Road, Post Box 292,
> > OX2 7DL, Oxford, Oxfordshire , United Kingdom
> ADA Logics Ltd is registered in England. No: 11624074.
> Registered office: 266 Banbury Road, Post Box 292,
> OX2 7DL, Oxford, Oxfordshire , United Kingdom
> _______________________________________________
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: asmeu...@gmail.com
>
_______________________________________________
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

Reply via email to