On 6 June 2017 at 13:38, Neil Williams <neil.willi...@linaro.org> wrote: > This problem has been resolved inside the arm-probe configuration, it > is not a fault within LAVA. There was a concern that the probe was not > showing data output because of a theoretical problem of running > daemonized instead of with a controlling terminal. The actual problem > was that the probe software is running more slowly than expected and > extending the runtime of the utility allows the probe to output data. > https://staging.validation.linaro.org/scheduler/job/175033#L2038 > https://git.linaro.org/lava-team/refactoring.git/commit/?id=7916e6c3db5188e2c2e96da0b666a36ab3e8ffeb
ok so the 2seconds for timeout was your problem > > (The verbose option was later dropped to output only the interesting data.) > > The configuration file in the git repo needs to be modified. > > https://git.linaro.org/lava-team/refactoring.git/tree/testdefs/aep-config?id=e08f0bed2c3561421bc2f430ab2e38f1b659e2fd can you point out the modification you did that has been needed ? I can't see any obvious difference except using /dev/ttyACM0 instead of /dev/serial/by-id/usb-NXP_SEMICOND_ARM_Energy_Probe_S_NO44440001-if00. Is it the difference ? What about using 2 AEPs ? Regards, Vincent > > > > On 29 May 2017 at 16:45, Vincent Guittot <vincent.guit...@linaro.org> wrote: >> On 25 May 2017 at 10:03, Neil Williams <codeh...@debian.org> wrote: >>> On Wed, 24 May 2017 21:07:45 +0200 >>> Vincent Guittot <vincent.guit...@linaro.org> wrote: >>> >>>> Hi Neil, >>>> >>>> Le 24 mai 2017 7:42 PM, "Lisa Nguyen" <lisa.ngu...@linaro.org> a >>>> écrit : >>>> >>>> On 24 May 2017 at 17:02, Neil Williams <codeh...@debian.org> wrote: >>>> > On Fri, 19 May 2017 17:02:14 +0100 >>>> > Neil Williams <codeh...@debian.org> wrote: >>>> > >>>> >> On Fri, 19 May 2017 16:48:11 +0100 >>>> >> Steve McIntyre <steve.mcint...@linaro.org> wrote: >>>> >> >>>> >> > Hi folks! >>>> >> > >>>> >> > On Wed, May 17, 2017 at 03:05:41PM +0100, Neil Williams wrote: >>>> >> > >On Thu, 27 Apr 2017 08:19:19 +0100 >>>> >> > >Neil Williams <codeh...@debian.org> wrote: >>>> >> > > >>>> >> > >>>> >> > I've just run a local test with an AEP inside lxc on my local >>>> >> > machine. As far as I can see, there's nothing particularly magic >>>> >> > going on here. The only problem I see is Lisa's config file >>>> >> > pointing at the wrong device file. arm-probe needs a ttyACM-style >>>> >> > device to talk to. Using: >>>> >> > >>>> >> > # lxc-device -n lxc-aep-test-174524 add /dev/ttyACM0 >>>> >> > >>>> >> > I create that device in my container. I build libwebsockets and >>>> >> > the arm-probe software in the container, then >>>> >> > specify /dev/ttyACM0 in the AEP config file. I can run it just >>>> >> > fine: >>>> >> > >>>> >> > root@lxc-aep-test-174524:/arm-probe# ./arm-probe/arm-probe -C >>>> >> > panda-aep.cfg -l10 -x # configuration: panda-aep.cfg >>>> >> > # config_name: pandaboard >>>> >> > # trigger: 0.400000V (hyst 0.200000V) 0.000000W (hyst 0.200000W) >>>> >> > 400us Configuration: pandaboard >>>> >> > # date: Fri, 19 May 2017 16:29:50 +0100 >>>> >> > # host: lxc-aep-test-174524 >>>> >> > # >>>> >> > + /dev/ttyACM0 >>>> >> > Starting... >>>> >> > sending start to 0 >>>> >> > # VDD_ALL VDD ROOT #ff0000 SoC >>>> >> > # >>>> >> > # >>>> >> > time VDD(V) VDD(A) VDD(W) >>>> >> > 0.000500 5.11 0.0474 0.24196 >>>> >> > 0.000600 5.11 0.0364 0.18572 >>>> >> > 0.000700 5.11 0.0314 0.16012 >>>> >> > 0.000800 5.10 0.0544 0.27734 >>>> >> > 0.000900 5.10 0.0234 0.11923 >>>> >> > 0.001000 5.11 0.0304 0.15505 >>>> >> > ... >>>> >> > >>>> >> > I don't have any problems running things and getting output here. >>>> >> > >>>> >> > I *have* seen two real bugs here while trying to get things >>>> >> > running, though: >>>> >> > >>>> >> > 1. If the device specified in the config file doesn't exist, or >>>> >> > is the wrong type of device, or (maybe) there is any other kind >>>> >> > of problem with it, you get *no* useful feedback to say there's a >>>> >> > problem. Running things under strace will show the background >>>> >> > libarmep process attempt to use the device specified, but >>>> >> > there's no error handling. :-( >>>> >> > >>>> >> > 2. The "-x" option says that the arm-probe program is meant to >>>> >> > exit when you've done capturing, but it just sits there forever >>>> >> > when I'm testing. I've wrapped it using the "timeout" command to >>>> >> > work around that for now. >>>> >> > >>>> >> > If I knew where to file those bugs, I would, but it's really not >>>> >> > obvious. They're really easy to reproduce, I hope... >>>> >> > >>>> >> > In terms of the /dev/ttyACM0 creation, the lxc-device man page >>>> >> > says that it creates devices based on their existing entries on >>>> >> > the host. Double-check that the host (dispatcher) has an >>>> >> > appropriate /dev/ttyACM0 if you're still seeing problems? >>>> >> >>>> >> Steve was using staging-panda03 with the ARM Energy Probe which I'd >>>> >> been using for the tests of the new code to ensure >>>> >> that /dev/ttyACM0 can be attached to the LXC. >>>> >> >>>> >> That panda and AEP will shortly return to staging and then the >>>> >> changes to LAVA and the required changes to the test definition >>>> >> can be available for the 2017.6 release. >>>> > >>>> > OK. staging-panda03 is back and has been running tests. This is what >>>> > we've learnt so far: >>>> > >>>> > 0: This does not appear to be an LXC issue. Running the commands >>>> > manually on the worker with the same LXC on the same worker does >>>> > return data from the probe. >>>> > >>>> > 1: Running the same commands in "headless" mode shows that the probe >>>> > software starts successfully but something within the protocol >>>> > parser or sampler fails to retrieve data. >>>> >>>> >>>> What do you mean by headless mode? >>> >>> With no controlling terminal. >>> >>> LAVA runs as a daemon and forks processes to run the tests. This does >>> not usually cause issues and is fundamental to automation. When I run >>> the same commands in an LXC as a user logged into the machine, I get >>> output. When I run the commands from a daemon, the output is not seen. >> >> even when you redirect the output to a file ? >> >> On workload automation, arm_probe is called in a dedicated process >> with subprocess.Popen and we are able to get data in the file. >> Just wonder what could be the difference in lava case >> >>> >>>> > >>>> > 2: The websockets dependency is completely unnecessary and has been >>>> > disabled in the build I've been testing: >>>> > https://git.linaro.org/lava-team/arm-probe.git/ >>>> >>>> >>>> Yes. I do the same. aepd is only useful for the web interface. >>>> >>>> >>>> > >>>> > 3: We've added a *lot* of debug to the arm-probe code >>>> > (https://staging.validation.linaro.org/scheduler/job/174969 which >>>> > was run using >>>> > https://git.linaro.org/lava-team/arm-probe.git/commit/?id=9b >>>> 2958e3045da77d7db25a7cfe48359211aa4cf1) >>>> > but are not much closer to identifying the precise problem with the >>>> > code. However, I am satisfied that this is a problem in the >>>> > arm-probe software when being run in automation. >>>> >>>> >>>> Can you give details about "this is a problem in arm probe software >>>> when being run in automation"? Do you mean workload automation? >>> >>> No. Not workload automation - that is a specific test framework which >>> can use LAVA. I'm talking about the process of running tests on behalf >>> of users without the users being logged in or interacting with the >>> shell. >> >> ok. Just to be sure about the context >> >>> >>>> > >>>> > 4: the arm-probe code is appallingly difficult to read and debug. It >>>> > also seems unnecessarily complex. >>>> > >>>> > 5: I plan to remove a lot of the debug from the cloned arm-probe >>>> > repository (which has also had a few fixes to compile with gcc6) but >>>> > I'm running out of time to work on the arm-probe software myself. >>>> > >>>> > Someone needs to update the arm-probe software: >>>> > >>>> > a) to remove websockets as a compile-time option as this only bloats >>>> > the build in automation where a web based UI is impossible anyway. >>>> > I've done this by brute force in my cloned repo, I just patched out >>>> > the dependency. >>>> > >>>> > b) improve the code to have comments and output about what is >>>> > happening and why when verbose mode is used. >>>> > >>>> > c) Identify what is preventing the software from receiving data from >>>> > the probe when run in automation. >>>> > >>>> > d) the config file still needs fixes to allow for changes in the >>>> > device node name from one probe to another. >>>> > >>>> > -- >>>> >>>> CC'ing Vincent, so he can read Neil's and Steve's comments above and >>>> respond (if he has anything to say) while I'm on holiday until early >>>> June. >>> >>> Steve & I are also on annual leave next week. >>> >>> -- >>> >>> >>> Neil Williams >>> ============= >>> http://www.linux.codehelp.co.uk/ >>> > > > > -- > > Neil Williams > ============= > neil.willi...@linaro.org > http://www.linux.codehelp.co.uk/ _______________________________________________ linaro-validation mailing list linaro-validation@lists.linaro.org https://lists.linaro.org/mailman/listinfo/linaro-validation