On 6 June 2017 at 12:53, Vincent Guittot <[email protected]> wrote: > On 6 June 2017 at 13:38, Neil Williams <[email protected]> wrote: >> This problem has been resolved inside the arm-probe configuration, it >> is not a fault within LAVA. There was a concern that the probe was not >> showing data output because of a theoretical problem of running >> daemonized instead of with a controlling terminal. The actual problem >> was that the probe software is running more slowly than expected and >> extending the runtime of the utility allows the probe to output data. >> https://staging.validation.linaro.org/scheduler/job/175033#L2038 >> https://git.linaro.org/lava-team/refactoring.git/commit/?id=7916e6c3db5188e2c2e96da0b666a36ab3e8ffeb > > ok so the 2seconds for timeout was your problem
That and the problem with the config file. >> (The verbose option was later dropped to output only the interesting data.) >> >> The configuration file in the git repo needs to be modified. >> >> https://git.linaro.org/lava-team/refactoring.git/tree/testdefs/aep-config?id=e08f0bed2c3561421bc2f430ab2e38f1b659e2fd > > can you point out the modification you did that has been needed ? I > can't see any obvious difference except using /dev/ttyACM0 instead of > /dev/serial/by-id/usb-NXP_SEMICOND_ARM_Energy_Probe_S_NO44440001-if00. > Is it the difference ? Yes, because inside the LXC, /dev/serial/by-id does not get created (there is no udev support for that inside containers). > What about using 2 AEPs ? That would have to be fixed either in the test shell definitions (e.g. using parameters passed through the test job) or within the arm-probe code itself. I have no idea at this stage whether the arm-probe software can cope with multiple probes - in LAVA that would likely need secondary connections and MultiNode to separate the output. The syntax of the arm-probe configuration file does not make this easy but that section could be patched to use a more sane structure. That isn't related to the LAVA support though. >> >> >> On 29 May 2017 at 16:45, Vincent Guittot <[email protected]> wrote: >>> On 25 May 2017 at 10:03, Neil Williams <[email protected]> wrote: >>>> On Wed, 24 May 2017 21:07:45 +0200 >>>> Vincent Guittot <[email protected]> wrote: >>>> >>>>> Hi Neil, >>>>> >>>>> Le 24 mai 2017 7:42 PM, "Lisa Nguyen" <[email protected]> a >>>>> écrit : >>>>> >>>>> On 24 May 2017 at 17:02, Neil Williams <[email protected]> wrote: >>>>> > On Fri, 19 May 2017 17:02:14 +0100 >>>>> > Neil Williams <[email protected]> wrote: >>>>> > >>>>> >> On Fri, 19 May 2017 16:48:11 +0100 >>>>> >> Steve McIntyre <[email protected]> wrote: >>>>> >> >>>>> >> > Hi folks! >>>>> >> > >>>>> >> > On Wed, May 17, 2017 at 03:05:41PM +0100, Neil Williams wrote: >>>>> >> > >On Thu, 27 Apr 2017 08:19:19 +0100 >>>>> >> > >Neil Williams <[email protected]> wrote: >>>>> >> > > >>>>> >> > >>>>> >> > I've just run a local test with an AEP inside lxc on my local >>>>> >> > machine. As far as I can see, there's nothing particularly magic >>>>> >> > going on here. The only problem I see is Lisa's config file >>>>> >> > pointing at the wrong device file. arm-probe needs a ttyACM-style >>>>> >> > device to talk to. Using: >>>>> >> > >>>>> >> > # lxc-device -n lxc-aep-test-174524 add /dev/ttyACM0 >>>>> >> > >>>>> >> > I create that device in my container. I build libwebsockets and >>>>> >> > the arm-probe software in the container, then >>>>> >> > specify /dev/ttyACM0 in the AEP config file. I can run it just >>>>> >> > fine: >>>>> >> > >>>>> >> > root@lxc-aep-test-174524:/arm-probe# ./arm-probe/arm-probe -C >>>>> >> > panda-aep.cfg -l10 -x # configuration: panda-aep.cfg >>>>> >> > # config_name: pandaboard >>>>> >> > # trigger: 0.400000V (hyst 0.200000V) 0.000000W (hyst 0.200000W) >>>>> >> > 400us Configuration: pandaboard >>>>> >> > # date: Fri, 19 May 2017 16:29:50 +0100 >>>>> >> > # host: lxc-aep-test-174524 >>>>> >> > # >>>>> >> > + /dev/ttyACM0 >>>>> >> > Starting... >>>>> >> > sending start to 0 >>>>> >> > # VDD_ALL VDD ROOT #ff0000 SoC >>>>> >> > # >>>>> >> > # >>>>> >> > time VDD(V) VDD(A) VDD(W) >>>>> >> > 0.000500 5.11 0.0474 0.24196 >>>>> >> > 0.000600 5.11 0.0364 0.18572 >>>>> >> > 0.000700 5.11 0.0314 0.16012 >>>>> >> > 0.000800 5.10 0.0544 0.27734 >>>>> >> > 0.000900 5.10 0.0234 0.11923 >>>>> >> > 0.001000 5.11 0.0304 0.15505 >>>>> >> > ... >>>>> >> > >>>>> >> > I don't have any problems running things and getting output here. >>>>> >> > >>>>> >> > I *have* seen two real bugs here while trying to get things >>>>> >> > running, though: >>>>> >> > >>>>> >> > 1. If the device specified in the config file doesn't exist, or >>>>> >> > is the wrong type of device, or (maybe) there is any other kind >>>>> >> > of problem with it, you get *no* useful feedback to say there's a >>>>> >> > problem. Running things under strace will show the background >>>>> >> > libarmep process attempt to use the device specified, but >>>>> >> > there's no error handling. :-( >>>>> >> > >>>>> >> > 2. The "-x" option says that the arm-probe program is meant to >>>>> >> > exit when you've done capturing, but it just sits there forever >>>>> >> > when I'm testing. I've wrapped it using the "timeout" command to >>>>> >> > work around that for now. >>>>> >> > >>>>> >> > If I knew where to file those bugs, I would, but it's really not >>>>> >> > obvious. They're really easy to reproduce, I hope... >>>>> >> > >>>>> >> > In terms of the /dev/ttyACM0 creation, the lxc-device man page >>>>> >> > says that it creates devices based on their existing entries on >>>>> >> > the host. Double-check that the host (dispatcher) has an >>>>> >> > appropriate /dev/ttyACM0 if you're still seeing problems? >>>>> >> >>>>> >> Steve was using staging-panda03 with the ARM Energy Probe which I'd >>>>> >> been using for the tests of the new code to ensure >>>>> >> that /dev/ttyACM0 can be attached to the LXC. >>>>> >> >>>>> >> That panda and AEP will shortly return to staging and then the >>>>> >> changes to LAVA and the required changes to the test definition >>>>> >> can be available for the 2017.6 release. >>>>> > >>>>> > OK. staging-panda03 is back and has been running tests. This is what >>>>> > we've learnt so far: >>>>> > >>>>> > 0: This does not appear to be an LXC issue. Running the commands >>>>> > manually on the worker with the same LXC on the same worker does >>>>> > return data from the probe. >>>>> > >>>>> > 1: Running the same commands in "headless" mode shows that the probe >>>>> > software starts successfully but something within the protocol >>>>> > parser or sampler fails to retrieve data. >>>>> >>>>> >>>>> What do you mean by headless mode? >>>> >>>> With no controlling terminal. >>>> >>>> LAVA runs as a daemon and forks processes to run the tests. This does >>>> not usually cause issues and is fundamental to automation. When I run >>>> the same commands in an LXC as a user logged into the machine, I get >>>> output. When I run the commands from a daemon, the output is not seen. >>> >>> even when you redirect the output to a file ? >>> >>> On workload automation, arm_probe is called in a dedicated process >>> with subprocess.Popen and we are able to get data in the file. >>> Just wonder what could be the difference in lava case >>> >>>> >>>>> > >>>>> > 2: The websockets dependency is completely unnecessary and has been >>>>> > disabled in the build I've been testing: >>>>> > https://git.linaro.org/lava-team/arm-probe.git/ >>>>> >>>>> >>>>> Yes. I do the same. aepd is only useful for the web interface. >>>>> >>>>> >>>>> > >>>>> > 3: We've added a *lot* of debug to the arm-probe code >>>>> > (https://staging.validation.linaro.org/scheduler/job/174969 which >>>>> > was run using >>>>> > https://git.linaro.org/lava-team/arm-probe.git/commit/?id=9b >>>>> 2958e3045da77d7db25a7cfe48359211aa4cf1) >>>>> > but are not much closer to identifying the precise problem with the >>>>> > code. However, I am satisfied that this is a problem in the >>>>> > arm-probe software when being run in automation. >>>>> >>>>> >>>>> Can you give details about "this is a problem in arm probe software >>>>> when being run in automation"? Do you mean workload automation? >>>> >>>> No. Not workload automation - that is a specific test framework which >>>> can use LAVA. I'm talking about the process of running tests on behalf >>>> of users without the users being logged in or interacting with the >>>> shell. >>> >>> ok. Just to be sure about the context >>> >>>> >>>>> > >>>>> > 4: the arm-probe code is appallingly difficult to read and debug. It >>>>> > also seems unnecessarily complex. >>>>> > >>>>> > 5: I plan to remove a lot of the debug from the cloned arm-probe >>>>> > repository (which has also had a few fixes to compile with gcc6) but >>>>> > I'm running out of time to work on the arm-probe software myself. >>>>> > >>>>> > Someone needs to update the arm-probe software: >>>>> > >>>>> > a) to remove websockets as a compile-time option as this only bloats >>>>> > the build in automation where a web based UI is impossible anyway. >>>>> > I've done this by brute force in my cloned repo, I just patched out >>>>> > the dependency. >>>>> > >>>>> > b) improve the code to have comments and output about what is >>>>> > happening and why when verbose mode is used. >>>>> > >>>>> > c) Identify what is preventing the software from receiving data from >>>>> > the probe when run in automation. >>>>> > >>>>> > d) the config file still needs fixes to allow for changes in the >>>>> > device node name from one probe to another. >>>>> > >>>>> > -- >>>>> >>>>> CC'ing Vincent, so he can read Neil's and Steve's comments above and >>>>> respond (if he has anything to say) while I'm on holiday until early >>>>> June. >>>> >>>> Steve & I are also on annual leave next week. >>>> >>>> -- >>>> >>>> >>>> Neil Williams >>>> ============= >>>> http://www.linux.codehelp.co.uk/ >>>> >> >> >> >> -- >> >> Neil Williams >> ============= >> [email protected] >> http://www.linux.codehelp.co.uk/ -- Neil Williams ============= [email protected] http://www.linux.codehelp.co.uk/ _______________________________________________ linaro-validation mailing list [email protected] https://lists.linaro.org/mailman/listinfo/linaro-validation
