On Thu, Jul 6, 2017 at 4:14 PM, Cillian O'Donnell <cpodonne...@gmail.com> wrote:
> On 6 July 2017 at 19:14, Joel Sherrill <j...@rtems.org> wrote: > > > > > > On Thu, Jul 6, 2017 at 11:11 AM, Cillian O'Donnell < > cpodonne...@gmail.com> > > wrote: > >> > >> On 6 July 2017 at 15:34, Joel Sherrill <j...@rtems.org> wrote: > >> > > >> > > >> > On Thu, Jul 6, 2017 at 5:53 AM, Cillian O'Donnell > >> > <cpodonne...@gmail.com> > >> > wrote: > >> >> > >> >> On 6 July 2017 at 08:05, Chris Johns <chr...@rtems.org> wrote: > >> >> > On 05/07/2017 22:46, Cillian O'Donnell wrote: > >> >> >> > >> >> >> Now the thing is, I might have to run RTEMS Tester about 3 times > for > >> >> >> the full testsuite before I can get the report, as the covoar runs > >> >> >> will often hang on an arbitrary method and have to be manually > >> >> >> exited > >> >> >> and re-run. > >> >> > > >> >> > How is covoar being run? Is it being run by the 'rtems-test' > command? > >> >> > >> >> Yes covoar runs after the testsuite if a --coverage flag is included > >> >> with rtems-test command. > >> >> > > >> >> >> The point that this happens is not repeatable and there's > >> >> >> no error message to go on. > >> >> > > >> >> > Is it the covoar process that is stopping? Is it the process idle > or > >> >> > using 100%? > >> >> > >> >> Covoar locks up and becomes idle. Joel mentioned in the IRC meeting > >> >> he's seen it before and I'll need to track down the exe causing it > and > >> >> debug it to get rid of this > >> > > >> > > >> > There must be a misunderstanding. I have never seen covoar lock up > >> > like this. > >> > >> Oh sorry, you must of just meant debugging covoar in general, picked > >> it up wrong. > >> > > >> > It will ignore records when it thinks things are inconsistent. This > can > >> > occur > >> > when a method appears in two different executables and has different > >> > sizes. The cause of this is usually padding at the end of the method > so > >> > the subsequent method in memory starts on a nice cache-line alignment. > >> > The code tries to recognize the nops/padding at the end and ignore > them. > >> > When the padding inserted by ld changes or the objdump output being > >> > parsed changes, covoar needs to be adjusted. > >> > > >> > The ignored record message I saw is in the code that reads Couverture > >> > trace records. The info in the record appears to be inconsistent with > >> > the > >> > code to which it has been matched. This could be because the trace > >> > record format has been changed or a simple bug. > >> > > >> > From what I have seen, it is likely that it is primarily bugs in input > >> > processing. > >> > > >> > But the code to do the "big union" of the trace data has to iterate > over > >> > all tests and all coverage blocks. It can take a while to run. I guess > >> > there could be an infinite loop in it. I always debugged it by having > >> > a test case I could run in gdb. > >> > > >> >> > >> >> > > >> >> >> The only common theme is the methods will > >> >> >> often be called thread...something. Even more rarely it will stop > >> >> >> with > >> >> >> the message 'resource temporarily unavailable'. I'm wondering is > >> >> >> this > >> >> >> normal behaviour for covoar on a long test-run? This never happens > >> >> >> when running just the samples. > >> >> > > >> >> > It is not normal behavior. > >> > > >> > > >> > +1 > >> > > >> > Is Linux or covoar printing the message? It holds all data in memory > >> > while > >> > doing the analysis. Is it consuming a lot of RAM? > >> > >> I done two rtems-test coverage runs there while monitoring 'top' > >> output. The first run is successful and coverage report is generated > >> and the second run produced IO Error: Resource temporarily unavailable > >> on the second. Both times theres about 3.5-4GB RAM free and CPU runs > >> at 70-80% so nothing seems to be strained there. > >> Timecounter_Tick_simple is being checked before it happens. The output > >> is > >> > >> 4000c9d8: 0a 80 00 03 bcs 4000c9e4 <_Timecounter_Tick_simple+ > 0x78> > >> 4000c9dc: 84 01 20 01 add %g4, 1, %g2 > >> > >> bt = th->th_offset; > >> 4000c9e0: 84 10 00 04 mov %g4, %g2 > >> _bt->sec += _bt2->sec; > >> 4000c9e4: fa 06 e0 a0 ld [ %i3 + 0xa0 ], %i5 > >> 4000c9e8: 84 00 80 1d add %g2, %i5, %g2 > >> _tv->tv_usec = ((uint64_t)1000000 * (uint32_t)(_bt->frac >> 32)) >> > >> 32; > >> 4000c9ec: 39 00 03 d0 sethi %hi(0xf4000), %i4 > >> _ts->tv_nsec = ((uint64_t)1000000000 * > >> 4000c9f0: 3b 0e e6 b2 sethi %hi(0x3b9ac800), %i5 > >> _tv->tv_usec = ((uint64_t)1000000 * (uint32_t)(_bt->frac >> 32)) >> > >> 32; > >> 4000c9f4: b8 17 22 40 or %i4, 0x240, %i4 > >> _ts->tv_nsec = ((uint64_t)1000000000 * > >> 4000c9f8: ba 17 62 00 or %i5, 0x200, %i5 > >> _tv->tv_usec = ((uint64_t)1000000 * (uint32_t)(_bt->frac >> 32)) > >> >> 32;Exception in thread _stderr[c]: > >> Traceback (most recent call last): > >> File "/usr/lib/python2.7/threading.py", line 801, in > __bootstrap_inner > >> self.run() > >> File "/usr/lib/python2.7/threading.py", line 754, in run > >> self.__target(*self.__args, **self.__kwargs) > >> File > >> "/home/cpod/development/rtems/test/rtems-tools/rtemstoolkit/ > execute.py", > >> line 206, in _readthread > >> _output_line(line, exe, prefix, out, count) > >> File > >> "/home/cpod/development/rtems/test/rtems-tools/rtemstoolkit/ > execute.py", > >> line 184, in _output_line > >> out(prefix + line) > >> File > >> "/home/cpod/development/rtems/test/rtems-tools/tester/rt/coverage.py", > >> line 387, in output_handler > >> log.notice(text, stdout_only = False) > >> File > >> "/home/cpod/development/rtems/test/rtems-tools/rtemstoolkit/log.py", > >> line 104, in notice > >> print(l) > >> IOError: [Errno 11] Resource temporarily unavailable > > > > > > This looks like qemu is running too long with an RTEMS executable. This > > stack trace is RTEMS code. Not covoar code. Please check what executable > > is consuming so much run-time before it is killed. This will change the > > solution. :) > > > > covoar spends a lot of time iterating over sets of containers and > > executables. > > There could be an infinite loop in it but it is using STL containers for > > everything and is single threaded so I rather doubt it would go into an > > infinite loop. > > The executable running here is sprmsched01.exe I produced it again and > it was sp22.exe nnd then again with psxaio01.exe so it doesn't seem to > pin point a single test repeatably. > The rtems tester tries to run the tests in parallel. I wonder if this trips some conflict which we are not aware of. Chris.. Are there arguments to make it single-thread the qemu invocations as a test? > > > >> > >> > >> > > >> >> > >> >> >> When a test run is successful it will also finish with a lot of > >> >> >> these > >> >> >> messages: > >> >> >> > >> >> >> *** Trace block is inconsistent with coverage map > >> >> >> *** Trace block (0x4000c4fc - 0x4000c51f) for 36 bytes > >> >> >> *** Coverage map > >> >> >> /home/cpod/coverage_test/test/coverage/unlimited.exe.cov > >> >> >> INFO: DesiredSymbols::mergeCoverageMap - Unable to merge coverage > >> >> >> map > >> >> >> for _Thread_queue_Extract_with_proxy because the sizes are > different > >> >> > > >> >> > I wonder what sizes are different? > >> >> > >> >> For this one, Joel mentioned it could be something to do with nops > >> >> being added or space not matching and to take a look at _Thread_ > >> >> queue_Extract_with_proxy in the objdump and see if I can dig up > >> >> anything from that. That message appears for _Thread_ > >> >> queue_Extract_with_proxy on all tests from base_sp.exe to > >> >> unlimited.exe when running the full testsuite and on a few instances > >> >> it also mentions unable to merge coverage map for > TOD_TICKS_PER_SECOND > >> >> for the same reason. The mismatch is always 36 bytes. > >> > > >> > > >> > You need to find the two executables it differs on and see why the > size > >> > is different. It processes them from first to last on the command line > >> > so > >> > find the first one that has the symbol and compare the dump to that > >> > one it is flagging as different. > >> > >> Will the 2 executables be say for example unlimited.exe from the build > >> tree and the temporarily copied covoar executable unlimited.exe.cov? > >> So libscore.a info matches the original unlimited.exe and something > >> has changed in the covoar copies? > > > > > > No. It will be unlimited and one of the tests before it on the covoar > > command line. In particular, it should be the first one with the symbol > > in question. Compare the objdump's. > > Ok I just checked and the first instance is base_sp.exe, which is > actually the first test run so this appears for every single test. > Where we hoping to compare one without the problem to the one it first > appears on? > Yep. covoar builds its "truth" of the method from the first one instance of each symbol it sees. So this one symbol is coming from base_sp.exe and there is a difference between that and the first one where the error is reported. The first instance or the error reporting one may be incorrectly processed or formatted/padded in an unexpected way. > > > > > >> > >> >> > >> >> > >> >> Also I wanted to mention, there is quite a bit of work from the 2 > >> >> previous students on the integration of the coverage work into RTEMS > >> >> Tester, would you be interested in starting the merging process of > >> >> this. It might take a while to get everything reviewed and make any > >> >> necessary changes. At this point it is working. its just fixing bugs, > >> >> which I'm working on at the moment. > >> > > >> > > >> > Chris.. I suggested this. It seems the basic flow and integration of > >> > coverage into the tester is good enough to review and merge. > >> > > >> > --joel > >> > > >> >> > >> >> > >> >> Thanks, > >> >> > >> >> Cillian. > >> >> > > >> >> > Chris > >> >> _______________________________________________ > >> >> devel mailing list > >> >> devel@rtems.org > >> >> http://lists.rtems.org/mailman/listinfo/devel > >> > > >> > > > > > >
_______________________________________________ devel mailing list devel@rtems.org http://lists.rtems.org/mailman/listinfo/devel