This problem has been resolved inside the arm-probe configuration, it
is not a fault within LAVA. There was a concern that the probe was not
showing data output because of a theoretical problem of running
daemonized instead of with a controlling terminal. The actual problem
was that the probe software is running more slowly than expected and
extending the runtime of the utility allows the probe to output data.
https://staging.validation.linaro.org/scheduler/job/175033#L2038
https://git.linaro.org/lava-team/refactoring.git/commit/?id=7916e6c3db5188e2c2e96da0b666a36ab3e8ffeb

(The verbose option was later dropped to output only the interesting data.)

The configuration file in the git repo needs to be modified.

https://git.linaro.org/lava-team/refactoring.git/tree/testdefs/aep-config?id=e08f0bed2c3561421bc2f430ab2e38f1b659e2fd



On 29 May 2017 at 16:45, Vincent Guittot <vincent.guit...@linaro.org> wrote:
> On 25 May 2017 at 10:03, Neil Williams <codeh...@debian.org> wrote:
>> On Wed, 24 May 2017 21:07:45 +0200
>> Vincent Guittot <vincent.guit...@linaro.org> wrote:
>>
>>> Hi Neil,
>>>
>>> Le 24 mai 2017 7:42 PM, "Lisa Nguyen" <lisa.ngu...@linaro.org> a
>>> écrit :
>>>
>>> On 24 May 2017 at 17:02, Neil Williams <codeh...@debian.org> wrote:
>>> > On Fri, 19 May 2017 17:02:14 +0100
>>> > Neil Williams <codeh...@debian.org> wrote:
>>> >
>>> >> On Fri, 19 May 2017 16:48:11 +0100
>>> >> Steve McIntyre <steve.mcint...@linaro.org> wrote:
>>> >>
>>> >> > Hi folks!
>>> >> >
>>> >> > On Wed, May 17, 2017 at 03:05:41PM +0100, Neil Williams wrote:
>>> >> > >On Thu, 27 Apr 2017 08:19:19 +0100
>>> >> > >Neil Williams <codeh...@debian.org> wrote:
>>> >> > >
>>> >> >
>>> >> > I've just run a local test with an AEP inside lxc on my local
>>> >> > machine. As far as I can see, there's nothing particularly magic
>>> >> > going on here. The only problem I see is Lisa's config file
>>> >> > pointing at the wrong device file. arm-probe needs a ttyACM-style
>>> >> > device to talk to. Using:
>>> >> >
>>> >> > # lxc-device -n lxc-aep-test-174524 add /dev/ttyACM0
>>> >> >
>>> >> > I create that device in my container. I build libwebsockets and
>>> >> > the arm-probe software in the container, then
>>> >> > specify /dev/ttyACM0 in the AEP config file. I can run it just
>>> >> > fine:
>>> >> >
>>> >> > root@lxc-aep-test-174524:/arm-probe# ./arm-probe/arm-probe -C
>>> >> > panda-aep.cfg -l10 -x # configuration: panda-aep.cfg
>>> >> > # config_name: pandaboard
>>> >> > # trigger: 0.400000V (hyst 0.200000V) 0.000000W (hyst 0.200000W)
>>> >> > 400us Configuration: pandaboard
>>> >> > # date: Fri, 19 May 2017 16:29:50 +0100
>>> >> > # host: lxc-aep-test-174524
>>> >> > #
>>> >> > + /dev/ttyACM0
>>> >> > Starting...
>>> >> > sending start to 0
>>> >> > # VDD_ALL       VDD     ROOT    #ff0000 SoC
>>> >> > #
>>> >> > #
>>> >> > time  VDD(V) VDD(A) VDD(W)
>>> >> > 0.000500  5.11 0.0474 0.24196
>>> >> > 0.000600  5.11 0.0364 0.18572
>>> >> > 0.000700  5.11 0.0314 0.16012
>>> >> > 0.000800  5.10 0.0544 0.27734
>>> >> > 0.000900  5.10 0.0234 0.11923
>>> >> > 0.001000  5.11 0.0304 0.15505
>>> >> > ...
>>> >> >
>>> >> > I don't have any problems running things and getting output here.
>>> >> >
>>> >> > I *have* seen two real bugs here while trying to get things
>>> >> > running, though:
>>> >> >
>>> >> >  1. If the device specified in the config file doesn't exist, or
>>> >> > is the wrong type of device, or (maybe) there is any other kind
>>> >> > of problem with it, you get *no* useful feedback to say there's a
>>> >> >     problem. Running things under strace will show the background
>>> >> >     libarmep process attempt to use the device specified, but
>>> >> > there's no error handling. :-(
>>> >> >
>>> >> > 2. The "-x" option says that the arm-probe program is meant to
>>> >> > exit when you've done capturing, but it just sits there forever
>>> >> > when I'm testing. I've wrapped it using the "timeout" command to
>>> >> > work around that for now.
>>> >> >
>>> >> > If I knew where to file those bugs, I would, but it's really not
>>> >> > obvious. They're really easy to reproduce, I hope...
>>> >> >
>>> >> > In terms of the /dev/ttyACM0 creation, the lxc-device man page
>>> >> > says that it creates devices based on their existing entries on
>>> >> > the host. Double-check that the host (dispatcher) has an
>>> >> > appropriate /dev/ttyACM0 if you're still seeing problems?
>>> >>
>>> >> Steve was using staging-panda03 with the ARM Energy Probe which I'd
>>> >> been using for the tests of the new code to ensure
>>> >> that /dev/ttyACM0 can be attached to the LXC.
>>> >>
>>> >> That panda and AEP will shortly return to staging and then the
>>> >> changes to LAVA and the required changes to the test definition
>>> >> can be available for the 2017.6 release.
>>> >
>>> > OK. staging-panda03 is back and has been running tests. This is what
>>> > we've learnt so far:
>>> >
>>> > 0: This does not appear to be an LXC issue. Running the commands
>>> > manually on the worker with the same LXC on the same worker does
>>> > return data from the probe.
>>> >
>>> > 1: Running the same commands in "headless" mode shows that the probe
>>> > software starts successfully but something within the protocol
>>> > parser or sampler fails to retrieve data.
>>>
>>>
>>> What do you mean by headless mode?
>>
>> With no controlling terminal.
>>
>> LAVA runs as a daemon and forks processes to run the tests. This does
>> not usually cause issues and is fundamental to automation. When I run
>> the same commands in an LXC as a user logged into the machine, I get
>> output. When I run the commands from a daemon, the output is not seen.
>
> even when you redirect the output to a file ?
>
> On workload automation, arm_probe is called in a dedicated process
> with subprocess.Popen and we are able to get data in the file.
> Just wonder what could be the difference in lava case
>
>>
>>> >
>>> > 2: The websockets dependency is completely unnecessary and has been
>>> > disabled in the build I've been testing:
>>> > https://git.linaro.org/lava-team/arm-probe.git/
>>>
>>>
>>> Yes. I do the same. aepd is only useful for the web interface.
>>>
>>>
>>> >
>>> > 3: We've added a *lot* of debug to the arm-probe code
>>> > (https://staging.validation.linaro.org/scheduler/job/174969 which
>>> > was run using
>>> > https://git.linaro.org/lava-team/arm-probe.git/commit/?id=9b
>>> 2958e3045da77d7db25a7cfe48359211aa4cf1)
>>> > but are not much closer to identifying the precise problem with the
>>> > code. However, I am satisfied that this is a problem in the
>>> > arm-probe software when being run in automation.
>>>
>>>
>>> Can you give details about "this is a problem in arm probe software
>>> when being run in automation"? Do you mean workload automation?
>>
>> No. Not workload automation - that is a specific test framework which
>> can use LAVA. I'm talking about the process of running tests on behalf
>> of users without the users being logged in or interacting with the
>> shell.
>
> ok. Just to be sure about the context
>
>>
>>> >
>>> > 4: the arm-probe code is appallingly difficult to read and debug. It
>>> > also seems unnecessarily complex.
>>> >
>>> > 5: I plan to remove a lot of the debug from the cloned arm-probe
>>> > repository (which has also had a few fixes to compile with gcc6) but
>>> > I'm running out of time to work on the arm-probe software myself.
>>> >
>>> > Someone needs to update the arm-probe software:
>>> >
>>> > a) to remove websockets as a compile-time option as this only bloats
>>> > the build in automation where a web based UI is impossible anyway.
>>> > I've done this by brute force in my cloned repo, I just patched out
>>> > the dependency.
>>> >
>>> > b) improve the code to have comments and output about what is
>>> > happening and why when verbose mode is used.
>>> >
>>> > c) Identify what is preventing the software from receiving data from
>>> > the probe when run in automation.
>>> >
>>> > d) the config file still needs fixes to allow for changes in the
>>> > device node name from one probe to another.
>>> >
>>> > --
>>>
>>> CC'ing Vincent, so he can read Neil's and Steve's comments above and
>>> respond (if he has anything to say) while I'm on holiday until early
>>> June.
>>
>> Steve & I are also on annual leave next week.
>>
>> --
>>
>>
>> Neil Williams
>> =============
>> http://www.linux.codehelp.co.uk/
>>



-- 

Neil Williams
=============
neil.willi...@linaro.org
http://www.linux.codehelp.co.uk/
_______________________________________________
linaro-validation mailing list
linaro-validation@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/linaro-validation

Reply via email to