On Wed, 24 May 2017 21:07:45 +0200 Vincent Guittot <[email protected]> wrote:
> Hi Neil, > > Le 24 mai 2017 7:42 PM, "Lisa Nguyen" <[email protected]> a > écrit : > > On 24 May 2017 at 17:02, Neil Williams <[email protected]> wrote: > > On Fri, 19 May 2017 17:02:14 +0100 > > Neil Williams <[email protected]> wrote: > > > >> On Fri, 19 May 2017 16:48:11 +0100 > >> Steve McIntyre <[email protected]> wrote: > >> > >> > Hi folks! > >> > > >> > On Wed, May 17, 2017 at 03:05:41PM +0100, Neil Williams wrote: > >> > >On Thu, 27 Apr 2017 08:19:19 +0100 > >> > >Neil Williams <[email protected]> wrote: > >> > > > >> > > >> > I've just run a local test with an AEP inside lxc on my local > >> > machine. As far as I can see, there's nothing particularly magic > >> > going on here. The only problem I see is Lisa's config file > >> > pointing at the wrong device file. arm-probe needs a ttyACM-style > >> > device to talk to. Using: > >> > > >> > # lxc-device -n lxc-aep-test-174524 add /dev/ttyACM0 > >> > > >> > I create that device in my container. I build libwebsockets and > >> > the arm-probe software in the container, then > >> > specify /dev/ttyACM0 in the AEP config file. I can run it just > >> > fine: > >> > > >> > root@lxc-aep-test-174524:/arm-probe# ./arm-probe/arm-probe -C > >> > panda-aep.cfg -l10 -x # configuration: panda-aep.cfg > >> > # config_name: pandaboard > >> > # trigger: 0.400000V (hyst 0.200000V) 0.000000W (hyst 0.200000W) > >> > 400us Configuration: pandaboard > >> > # date: Fri, 19 May 2017 16:29:50 +0100 > >> > # host: lxc-aep-test-174524 > >> > # > >> > + /dev/ttyACM0 > >> > Starting... > >> > sending start to 0 > >> > # VDD_ALL VDD ROOT #ff0000 SoC > >> > # > >> > # > >> > time VDD(V) VDD(A) VDD(W) > >> > 0.000500 5.11 0.0474 0.24196 > >> > 0.000600 5.11 0.0364 0.18572 > >> > 0.000700 5.11 0.0314 0.16012 > >> > 0.000800 5.10 0.0544 0.27734 > >> > 0.000900 5.10 0.0234 0.11923 > >> > 0.001000 5.11 0.0304 0.15505 > >> > ... > >> > > >> > I don't have any problems running things and getting output here. > >> > > >> > I *have* seen two real bugs here while trying to get things > >> > running, though: > >> > > >> > 1. If the device specified in the config file doesn't exist, or > >> > is the wrong type of device, or (maybe) there is any other kind > >> > of problem with it, you get *no* useful feedback to say there's a > >> > problem. Running things under strace will show the background > >> > libarmep process attempt to use the device specified, but > >> > there's no error handling. :-( > >> > > >> > 2. The "-x" option says that the arm-probe program is meant to > >> > exit when you've done capturing, but it just sits there forever > >> > when I'm testing. I've wrapped it using the "timeout" command to > >> > work around that for now. > >> > > >> > If I knew where to file those bugs, I would, but it's really not > >> > obvious. They're really easy to reproduce, I hope... > >> > > >> > In terms of the /dev/ttyACM0 creation, the lxc-device man page > >> > says that it creates devices based on their existing entries on > >> > the host. Double-check that the host (dispatcher) has an > >> > appropriate /dev/ttyACM0 if you're still seeing problems? > >> > >> Steve was using staging-panda03 with the ARM Energy Probe which I'd > >> been using for the tests of the new code to ensure > >> that /dev/ttyACM0 can be attached to the LXC. > >> > >> That panda and AEP will shortly return to staging and then the > >> changes to LAVA and the required changes to the test definition > >> can be available for the 2017.6 release. > > > > OK. staging-panda03 is back and has been running tests. This is what > > we've learnt so far: > > > > 0: This does not appear to be an LXC issue. Running the commands > > manually on the worker with the same LXC on the same worker does > > return data from the probe. > > > > 1: Running the same commands in "headless" mode shows that the probe > > software starts successfully but something within the protocol > > parser or sampler fails to retrieve data. > > > What do you mean by headless mode? With no controlling terminal. LAVA runs as a daemon and forks processes to run the tests. This does not usually cause issues and is fundamental to automation. When I run the same commands in an LXC as a user logged into the machine, I get output. When I run the commands from a daemon, the output is not seen. > > > > 2: The websockets dependency is completely unnecessary and has been > > disabled in the build I've been testing: > > https://git.linaro.org/lava-team/arm-probe.git/ > > > Yes. I do the same. aepd is only useful for the web interface. > > > > > > 3: We've added a *lot* of debug to the arm-probe code > > (https://staging.validation.linaro.org/scheduler/job/174969 which > > was run using > > https://git.linaro.org/lava-team/arm-probe.git/commit/?id=9b > 2958e3045da77d7db25a7cfe48359211aa4cf1) > > but are not much closer to identifying the precise problem with the > > code. However, I am satisfied that this is a problem in the > > arm-probe software when being run in automation. > > > Can you give details about "this is a problem in arm probe software > when being run in automation"? Do you mean workload automation? No. Not workload automation - that is a specific test framework which can use LAVA. I'm talking about the process of running tests on behalf of users without the users being logged in or interacting with the shell. > > > > 4: the arm-probe code is appallingly difficult to read and debug. It > > also seems unnecessarily complex. > > > > 5: I plan to remove a lot of the debug from the cloned arm-probe > > repository (which has also had a few fixes to compile with gcc6) but > > I'm running out of time to work on the arm-probe software myself. > > > > Someone needs to update the arm-probe software: > > > > a) to remove websockets as a compile-time option as this only bloats > > the build in automation where a web based UI is impossible anyway. > > I've done this by brute force in my cloned repo, I just patched out > > the dependency. > > > > b) improve the code to have comments and output about what is > > happening and why when verbose mode is used. > > > > c) Identify what is preventing the software from receiving data from > > the probe when run in automation. > > > > d) the config file still needs fixes to allow for changes in the > > device node name from one probe to another. > > > > -- > > CC'ing Vincent, so he can read Neil's and Steve's comments above and > respond (if he has anything to say) while I'm on holiday until early > June. Steve & I are also on annual leave next week. -- Neil Williams ============= http://www.linux.codehelp.co.uk/
pgpIuYeBUhgT6.pgp
Description: OpenPGP digital signature
_______________________________________________ linaro-validation mailing list [email protected] https://lists.linaro.org/mailman/listinfo/linaro-validation
