This problem has been resolved inside the arm-probe configuration, it is not a fault within LAVA. There was a concern that the probe was not showing data output because of a theoretical problem of running daemonized instead of with a controlling terminal. The actual problem was that the probe software is running more slowly than expected and extending the runtime of the utility allows the probe to output data. https://staging.validation.linaro.org/scheduler/job/175033#L2038 https://git.linaro.org/lava-team/refactoring.git/commit/?id=7916e6c3db5188e2c2e96da0b666a36ab3e8ffeb
(The verbose option was later dropped to output only the interesting data.) The configuration file in the git repo needs to be modified. https://git.linaro.org/lava-team/refactoring.git/tree/testdefs/aep-config?id=e08f0bed2c3561421bc2f430ab2e38f1b659e2fd On 29 May 2017 at 16:45, Vincent Guittot <vincent.guit...@linaro.org> wrote: > On 25 May 2017 at 10:03, Neil Williams <codeh...@debian.org> wrote: >> On Wed, 24 May 2017 21:07:45 +0200 >> Vincent Guittot <vincent.guit...@linaro.org> wrote: >> >>> Hi Neil, >>> >>> Le 24 mai 2017 7:42 PM, "Lisa Nguyen" <lisa.ngu...@linaro.org> a >>> écrit : >>> >>> On 24 May 2017 at 17:02, Neil Williams <codeh...@debian.org> wrote: >>> > On Fri, 19 May 2017 17:02:14 +0100 >>> > Neil Williams <codeh...@debian.org> wrote: >>> > >>> >> On Fri, 19 May 2017 16:48:11 +0100 >>> >> Steve McIntyre <steve.mcint...@linaro.org> wrote: >>> >> >>> >> > Hi folks! >>> >> > >>> >> > On Wed, May 17, 2017 at 03:05:41PM +0100, Neil Williams wrote: >>> >> > >On Thu, 27 Apr 2017 08:19:19 +0100 >>> >> > >Neil Williams <codeh...@debian.org> wrote: >>> >> > > >>> >> > >>> >> > I've just run a local test with an AEP inside lxc on my local >>> >> > machine. As far as I can see, there's nothing particularly magic >>> >> > going on here. The only problem I see is Lisa's config file >>> >> > pointing at the wrong device file. arm-probe needs a ttyACM-style >>> >> > device to talk to. Using: >>> >> > >>> >> > # lxc-device -n lxc-aep-test-174524 add /dev/ttyACM0 >>> >> > >>> >> > I create that device in my container. I build libwebsockets and >>> >> > the arm-probe software in the container, then >>> >> > specify /dev/ttyACM0 in the AEP config file. I can run it just >>> >> > fine: >>> >> > >>> >> > root@lxc-aep-test-174524:/arm-probe# ./arm-probe/arm-probe -C >>> >> > panda-aep.cfg -l10 -x # configuration: panda-aep.cfg >>> >> > # config_name: pandaboard >>> >> > # trigger: 0.400000V (hyst 0.200000V) 0.000000W (hyst 0.200000W) >>> >> > 400us Configuration: pandaboard >>> >> > # date: Fri, 19 May 2017 16:29:50 +0100 >>> >> > # host: lxc-aep-test-174524 >>> >> > # >>> >> > + /dev/ttyACM0 >>> >> > Starting... >>> >> > sending start to 0 >>> >> > # VDD_ALL VDD ROOT #ff0000 SoC >>> >> > # >>> >> > # >>> >> > time VDD(V) VDD(A) VDD(W) >>> >> > 0.000500 5.11 0.0474 0.24196 >>> >> > 0.000600 5.11 0.0364 0.18572 >>> >> > 0.000700 5.11 0.0314 0.16012 >>> >> > 0.000800 5.10 0.0544 0.27734 >>> >> > 0.000900 5.10 0.0234 0.11923 >>> >> > 0.001000 5.11 0.0304 0.15505 >>> >> > ... >>> >> > >>> >> > I don't have any problems running things and getting output here. >>> >> > >>> >> > I *have* seen two real bugs here while trying to get things >>> >> > running, though: >>> >> > >>> >> > 1. If the device specified in the config file doesn't exist, or >>> >> > is the wrong type of device, or (maybe) there is any other kind >>> >> > of problem with it, you get *no* useful feedback to say there's a >>> >> > problem. Running things under strace will show the background >>> >> > libarmep process attempt to use the device specified, but >>> >> > there's no error handling. :-( >>> >> > >>> >> > 2. The "-x" option says that the arm-probe program is meant to >>> >> > exit when you've done capturing, but it just sits there forever >>> >> > when I'm testing. I've wrapped it using the "timeout" command to >>> >> > work around that for now. >>> >> > >>> >> > If I knew where to file those bugs, I would, but it's really not >>> >> > obvious. They're really easy to reproduce, I hope... >>> >> > >>> >> > In terms of the /dev/ttyACM0 creation, the lxc-device man page >>> >> > says that it creates devices based on their existing entries on >>> >> > the host. Double-check that the host (dispatcher) has an >>> >> > appropriate /dev/ttyACM0 if you're still seeing problems? >>> >> >>> >> Steve was using staging-panda03 with the ARM Energy Probe which I'd >>> >> been using for the tests of the new code to ensure >>> >> that /dev/ttyACM0 can be attached to the LXC. >>> >> >>> >> That panda and AEP will shortly return to staging and then the >>> >> changes to LAVA and the required changes to the test definition >>> >> can be available for the 2017.6 release. >>> > >>> > OK. staging-panda03 is back and has been running tests. This is what >>> > we've learnt so far: >>> > >>> > 0: This does not appear to be an LXC issue. Running the commands >>> > manually on the worker with the same LXC on the same worker does >>> > return data from the probe. >>> > >>> > 1: Running the same commands in "headless" mode shows that the probe >>> > software starts successfully but something within the protocol >>> > parser or sampler fails to retrieve data. >>> >>> >>> What do you mean by headless mode? >> >> With no controlling terminal. >> >> LAVA runs as a daemon and forks processes to run the tests. This does >> not usually cause issues and is fundamental to automation. When I run >> the same commands in an LXC as a user logged into the machine, I get >> output. When I run the commands from a daemon, the output is not seen. > > even when you redirect the output to a file ? > > On workload automation, arm_probe is called in a dedicated process > with subprocess.Popen and we are able to get data in the file. > Just wonder what could be the difference in lava case > >> >>> > >>> > 2: The websockets dependency is completely unnecessary and has been >>> > disabled in the build I've been testing: >>> > https://git.linaro.org/lava-team/arm-probe.git/ >>> >>> >>> Yes. I do the same. aepd is only useful for the web interface. >>> >>> >>> > >>> > 3: We've added a *lot* of debug to the arm-probe code >>> > (https://staging.validation.linaro.org/scheduler/job/174969 which >>> > was run using >>> > https://git.linaro.org/lava-team/arm-probe.git/commit/?id=9b >>> 2958e3045da77d7db25a7cfe48359211aa4cf1) >>> > but are not much closer to identifying the precise problem with the >>> > code. However, I am satisfied that this is a problem in the >>> > arm-probe software when being run in automation. >>> >>> >>> Can you give details about "this is a problem in arm probe software >>> when being run in automation"? Do you mean workload automation? >> >> No. Not workload automation - that is a specific test framework which >> can use LAVA. I'm talking about the process of running tests on behalf >> of users without the users being logged in or interacting with the >> shell. > > ok. Just to be sure about the context > >> >>> > >>> > 4: the arm-probe code is appallingly difficult to read and debug. It >>> > also seems unnecessarily complex. >>> > >>> > 5: I plan to remove a lot of the debug from the cloned arm-probe >>> > repository (which has also had a few fixes to compile with gcc6) but >>> > I'm running out of time to work on the arm-probe software myself. >>> > >>> > Someone needs to update the arm-probe software: >>> > >>> > a) to remove websockets as a compile-time option as this only bloats >>> > the build in automation where a web based UI is impossible anyway. >>> > I've done this by brute force in my cloned repo, I just patched out >>> > the dependency. >>> > >>> > b) improve the code to have comments and output about what is >>> > happening and why when verbose mode is used. >>> > >>> > c) Identify what is preventing the software from receiving data from >>> > the probe when run in automation. >>> > >>> > d) the config file still needs fixes to allow for changes in the >>> > device node name from one probe to another. >>> > >>> > -- >>> >>> CC'ing Vincent, so he can read Neil's and Steve's comments above and >>> respond (if he has anything to say) while I'm on holiday until early >>> June. >> >> Steve & I are also on annual leave next week. >> >> -- >> >> >> Neil Williams >> ============= >> http://www.linux.codehelp.co.uk/ >> -- Neil Williams ============= neil.willi...@linaro.org http://www.linux.codehelp.co.uk/ _______________________________________________ linaro-validation mailing list linaro-validation@lists.linaro.org https://lists.linaro.org/mailman/listinfo/linaro-validation