For sharding in #4, you might be interested in what test-with-docker.py does. It already has some ASAN support, I think, so maybe adding UBSAN will be pretty straight-forward. In the screenshot below, you can see that I sharded CLUSTER_TEST and EE_TEST into two halves. It seems to work ok.
[image: image.png] But, anyway, it sounds like it's easiest to continue independently, and if we find occasion to merge, we can do so later. -- Philip On Mon, Nov 12, 2018 at 1:45 PM Jim Apple <jbap...@cloudera.com> wrote: > My only concern with this is the unfinished nature of the UBSAN work. Here > are some things left to do: > > 1. Make UBSAN builds work with FE and JDBC tests. These aren't UBSAN > unclean -- they plain don't work. > 2. Make e2e, custom cluster, and BE tests UBSAN clean. This is in progress. > 3. Make e2e tests pass with -full_ubsan. This turns UBSAN on during > codegen, and it seems to break some e2e tests. > 4. Give -full_ubsan a tolerable run time, probably by sharding the tests. > Right now the core suite takes 7 or 8 hours. > > If you get centos6 up and working as part of pre-merge, I'm comfortable > using it as a platform to test UBSAN on as we build more UBSAN coverage. > I'm also happy to try and keep any sharding in #4 distribution-agnostic so > that if UBSAN goes up first, you can easily port it to centos6. With those > two in place, my feeling is that the two improvements we want to make > "commute", in that whichever one we do first, the work and result should be > similar. > > What do you think? > > On Mon, Nov 12, 2018 at 1:26 PM Philip Zeyliger <phi...@cloudera.com> > wrote: > > > On Mon, Nov 12, 2018 at 1:21 PM Jim Apple <jbap...@cloudera.com> wrote: > > > > > I don't think I understand the "one stone" part - are you suggesting > that > > > we do UBSAN testing within a centos6 container? > > > > > > > Exactly. If we're doing multiple builds, we may as well be mutating other > > variables to get coverage from them. (Here, you're mutating "build type", > > and I'm proposing we also mutate "base OS" to get additional coverage.) > > > > -- Philip > > > > > > > > > > > > On Mon, Nov 12, 2018 at 1:01 PM Philip Zeyliger <phi...@cloudera.com> > > > wrote: > > > > > > > Seems useful to me. > > > > > > > > If you're interested, we could kill multiple birds with one stone. > > > > Specifically, I'm also interested in centos6/rh6 pre-merge testing. > > There > > > > are a variety of ways to do so, including running with > test-with-docker > > > > stuff. I recognize it's more work, but happy to help if you want to > try > > > it. > > > > > > > > -- Philip > > > > > > > > On Sat, Nov 10, 2018 at 11:10 PM Jim Apple <jbap...@cloudera.com> > > wrote: > > > > > > > > > C++ has some constructs that have undefined behavior. Shall we test > > for > > > > > this during pre-merge testing? > > > > > > > > > > When the behavior of C++ code is formally "undefined" by the > > standard, > > > > > compilers can behave erratically, like not taking either branch of > a > > > > > if/else statement. This can be reproduced in the wild. The standard > > > > itself > > > > > notes: > > > > > > > > > > "Using a bool value in ways described by this International > Standard > > as > > > > > 'undefined,' such as by examining the value of an uninitialized > > > automatic > > > > > object, might cause it to behave as if it is neither true nor > false." > > > > > > > > > > Clang has a checker for this called UBSAN, and, after some effort, > > the > > > > data > > > > > loading part of our build is now UBSAN-clean. I'm suggesting we add > > > that > > > > > test to the pre-merge testing. I'm happy to handle the details. > > > > > > > > > > When it fails, the output will look something like this: > > > > > > > > > > > https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/3573/console > > > > > > > > > > exprs/math-functions-ir.cc:405:13: runtime error: signed integer > > > > overflow: > > > > > 4738381338321616896 * 36 cannot be represented in type 'long' > > > > > runtime/decimal-value.inline.h:254:17: runtime error: signed > integer > > > > > overflow: 0x4b3b4ca85a86c47a098a223fffffffff + > > > > > 0x4b3b4ca85a86c47a098a223fffffffff cannot be represented in type > > > > '__int128' > > > > > runtime/row-batch-serialize-test.cc:243:18: runtime error: variable > > > > length > > > > > array bound evaluates to non-positive value 0 > > > > > > > > > > > > > > >