On 3/13/15 9:10 PM, Zachary Turner wrote:


On Fri, Mar 13, 2015 at 4:01 PM Jonathan Roelofs
<jonat...@codesourcery.com <mailto:jonat...@codesourcery.com>> wrote:

    +ddunbar

    On 3/13/15 9:53 AM, jing...@apple.com <mailto:jing...@apple.com> wrote:
     >>> Depending on how different the different things are.  Compiler
    tests
     >>> tend to have input, output and some machine that converts the
    input to
     >>> the output.  That is one very particular model of testing.
    Debugger
     >>> tests need to do: get to stage 1, if that succeeded, get to
    stage 2,
     >>> if that succeeded, etc.  Plus there's generally substantial
    setup code
     >>> to get somewhere interesting, so while you are there you
    generally try
     >>> to test a bunch of similar things.  Plus, the tests often have
    points
     >>> where there are several success cases, but each one requires a
     >>> different "next action", stepping being the prime example of this.
     >>> These are very different models and I don't see that trying to
    smush
     >>> the two together would be a fruitful exercise.

    I think LIT does make the assumption that one "test file" has one "test
    result". But this is a place where we could extend LIT a bit. I don't
    think it would be very painful.

    For me, this would be very useful for a few of the big libc++abi tests,
    like the demangler one, as currently I have to #ifdef out a couple of
    the cases that can't possibly work on my platform. It would be much
    nicer if that particular test file outputted multiple test results of
    which I could XFAIL the ones I know won't ever work. (For anyone who is
    curious, the one that comes to mind needs the c99 %a printf format,
    which my libc doesn't have. It's a baremetal target, and binary size is
    really important).

    How much actual benefit is there in having lots of results per test
    case, rather than having them all &&'d together to one result?

    Out of curiosity, does lldb's existing testsuite allow you to run
    individual test results in test cases where there are more than one test
    result?


  I think I'm not following this line of discussion.  So it's possible
you and Jim are talking about different things here.

I think that's the case... I was imagining the "logic of the test" something like this:

  1) Set 5 breakpoints
  2) Continue
  3) Assert that the debugger stopped at the first breakpoint
  4) Continue
  5) Assert that the debugger stopped at the second breakpoint
  6) etc.

Reading Jim's description again, with the help of your speculative example, it sounds like the test logic itself isn't straightline code.... that's okay too. What I was speaking to is a perceived difference in what the "results" of running such a test are.

In llvm, the assertions are CHECK lines. In libc++, the assertions are calls to `assert` from assert.h, as well as `static_assert`s. In both cases, failing any one of those checks in a test makes the whole test fail. For some reason I had the impression that in lldb there wasn't a single test result per *.py test. Perhaps that's not the case? Either way, what I want to emphasize is that LIT doesn't care about the "logic of the test", as long as there is one test result per test (and even that condition could be amended, if it would be useful for lldb).


If I understand correctly (and maybe I don't), what Jim is saying is
that a debugger test might need to do something like:

1) Set 5 breakpoints
2) Continue
3) Depending on which breakpoint gets hit, take one of 5 possible "next"
actions.

But I'm having trouble coming up with an example of why this might be
useful.  Jim, can you make this a little more concrete with a specific
example of a test that does this, how the test works, and what the
different success / failure cases are so we can be sure everyone is on
the same page?

In the case of the libc++ abi tests, I'm not sure what is meant by
"multiple results per test case".  Do you mean (for example) you'd like
to be able to XFAIL individual run lines based on some condition?  If

I think this means I should make the libc++abi example even more concrete.... In libc++/libc++abi tests, the "RUN" line is implicit (well, aside from the few ShTest tests ericwf has added recently). Every *.pass.cpp test is a file that the test harness knows it has to compile, run, and check its exit status. That being said, libcxxabi/test/test_demangle.pass.cpp has a huge array like this:

      20 const char* cases[][2] =
      21 {
      22     {"_Z1A", "A"},
      23     {"_Z1Av", "A()"},
      24     {"_Z1A1B1C", "A(B, C)"},
      25     {"_Z4testI1A1BE1Cv", "C test<A, B>()"},

   snip

   29594     {"_Zli2_xy", "operator\"\" _x(unsigned long long)"},
   29595     {"_Z1fIiEDcT_", "decltype(auto) f<int>(int)"},
   29596 };

Then there's some logic in `main()` that runs, __cxa_demangle on `cases[i][0]`, and asserts that it's the same as `cases[i][1]`. If any of those assertions fail, the entire test is marked as failing, and no further lines in that array are verified. For the sake of discussion, let's call each of entries in `cases` a "subtest", and the entirety of test_demangle.pass.cpp a test.

The sticky issue is that there are a few subtests in this test that don't make sense on various platforms, so currently, they are #ifdef'd out. If the LIT TestFormat and the tests themselves had a way to communicate that a subtest failed, but to continue running other subtests after that, then we could XFAIL these weird subtests individually.

Keep in mind though that I'm not really advocating we go and change test_demangle.pass.cpp to suit that model, because #ifdef's work reasonably well there, and there are relatively few subtests that have these platform differences... That's just the first example of the test/subtest relationship that I could think of.

so, LLDB definitely needs that.  One example which LLDB uses almost
everywhere is that of running the same test with dSYM or DWARF debug
info.  On Apple platforms, tests generally need to run with both dSYM
and DWARF debug info (literally just repeat the same test twice), and on
non Apple platforms, only DWARF tests ever need to be run.  So there
would need to be a way to express this.

Can you point me to an example of this?


There are plenty of other one-off examples.  Debuggers have a lot of
platform specific code, and the different platforms support different
amounts of functionality (especially for things like Android / Windows
that are works in progress).  So we frequently have the need to have a
single test file which has, say 10 tests in it.  And specific tests can
be XFAILed or even disabled individually based on conditions (usually
which platform is running the test suite, but not always).

--
Jon Roelofs
jonat...@codesourcery.com
CodeSourcery / Mentor Embedded
_______________________________________________
lldb-dev mailing list
lldb-dev@cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev

Reply via email to