On 6 June 2017 at 13:38, Neil Williams <neil.willi...@linaro.org> wrote:
> This problem has been resolved inside the arm-probe configuration, it
> is not a fault within LAVA. There was a concern that the probe was not
> showing data output because of a theoretical problem of running
> daemonized instead of with a controlling terminal. The actual problem
> was that the probe software is running more slowly than expected and
> extending the runtime of the utility allows the probe to output data.
> https://staging.validation.linaro.org/scheduler/job/175033#L2038
> https://git.linaro.org/lava-team/refactoring.git/commit/?id=7916e6c3db5188e2c2e96da0b666a36ab3e8ffeb

ok so the 2seconds for timeout was your problem

>
> (The verbose option was later dropped to output only the interesting data.)
>
> The configuration file in the git repo needs to be modified.
>
> https://git.linaro.org/lava-team/refactoring.git/tree/testdefs/aep-config?id=e08f0bed2c3561421bc2f430ab2e38f1b659e2fd

can you point out the modification you did that has been needed ? I
can't see any obvious difference except using /dev/ttyACM0 instead of
/dev/serial/by-id/usb-NXP_SEMICOND_ARM_Energy_Probe_S_NO44440001-if00.
Is it the difference ?

What about using 2 AEPs ?

Regards,
Vincent

>
>
>
> On 29 May 2017 at 16:45, Vincent Guittot <vincent.guit...@linaro.org> wrote:
>> On 25 May 2017 at 10:03, Neil Williams <codeh...@debian.org> wrote:
>>> On Wed, 24 May 2017 21:07:45 +0200
>>> Vincent Guittot <vincent.guit...@linaro.org> wrote:
>>>
>>>> Hi Neil,
>>>>
>>>> Le 24 mai 2017 7:42 PM, "Lisa Nguyen" <lisa.ngu...@linaro.org> a
>>>> écrit :
>>>>
>>>> On 24 May 2017 at 17:02, Neil Williams <codeh...@debian.org> wrote:
>>>> > On Fri, 19 May 2017 17:02:14 +0100
>>>> > Neil Williams <codeh...@debian.org> wrote:
>>>> >
>>>> >> On Fri, 19 May 2017 16:48:11 +0100
>>>> >> Steve McIntyre <steve.mcint...@linaro.org> wrote:
>>>> >>
>>>> >> > Hi folks!
>>>> >> >
>>>> >> > On Wed, May 17, 2017 at 03:05:41PM +0100, Neil Williams wrote:
>>>> >> > >On Thu, 27 Apr 2017 08:19:19 +0100
>>>> >> > >Neil Williams <codeh...@debian.org> wrote:
>>>> >> > >
>>>> >> >
>>>> >> > I've just run a local test with an AEP inside lxc on my local
>>>> >> > machine. As far as I can see, there's nothing particularly magic
>>>> >> > going on here. The only problem I see is Lisa's config file
>>>> >> > pointing at the wrong device file. arm-probe needs a ttyACM-style
>>>> >> > device to talk to. Using:
>>>> >> >
>>>> >> > # lxc-device -n lxc-aep-test-174524 add /dev/ttyACM0
>>>> >> >
>>>> >> > I create that device in my container. I build libwebsockets and
>>>> >> > the arm-probe software in the container, then
>>>> >> > specify /dev/ttyACM0 in the AEP config file. I can run it just
>>>> >> > fine:
>>>> >> >
>>>> >> > root@lxc-aep-test-174524:/arm-probe# ./arm-probe/arm-probe -C
>>>> >> > panda-aep.cfg -l10 -x # configuration: panda-aep.cfg
>>>> >> > # config_name: pandaboard
>>>> >> > # trigger: 0.400000V (hyst 0.200000V) 0.000000W (hyst 0.200000W)
>>>> >> > 400us Configuration: pandaboard
>>>> >> > # date: Fri, 19 May 2017 16:29:50 +0100
>>>> >> > # host: lxc-aep-test-174524
>>>> >> > #
>>>> >> > + /dev/ttyACM0
>>>> >> > Starting...
>>>> >> > sending start to 0
>>>> >> > # VDD_ALL       VDD     ROOT    #ff0000 SoC
>>>> >> > #
>>>> >> > #
>>>> >> > time  VDD(V) VDD(A) VDD(W)
>>>> >> > 0.000500  5.11 0.0474 0.24196
>>>> >> > 0.000600  5.11 0.0364 0.18572
>>>> >> > 0.000700  5.11 0.0314 0.16012
>>>> >> > 0.000800  5.10 0.0544 0.27734
>>>> >> > 0.000900  5.10 0.0234 0.11923
>>>> >> > 0.001000  5.11 0.0304 0.15505
>>>> >> > ...
>>>> >> >
>>>> >> > I don't have any problems running things and getting output here.
>>>> >> >
>>>> >> > I *have* seen two real bugs here while trying to get things
>>>> >> > running, though:
>>>> >> >
>>>> >> >  1. If the device specified in the config file doesn't exist, or
>>>> >> > is the wrong type of device, or (maybe) there is any other kind
>>>> >> > of problem with it, you get *no* useful feedback to say there's a
>>>> >> >     problem. Running things under strace will show the background
>>>> >> >     libarmep process attempt to use the device specified, but
>>>> >> > there's no error handling. :-(
>>>> >> >
>>>> >> > 2. The "-x" option says that the arm-probe program is meant to
>>>> >> > exit when you've done capturing, but it just sits there forever
>>>> >> > when I'm testing. I've wrapped it using the "timeout" command to
>>>> >> > work around that for now.
>>>> >> >
>>>> >> > If I knew where to file those bugs, I would, but it's really not
>>>> >> > obvious. They're really easy to reproduce, I hope...
>>>> >> >
>>>> >> > In terms of the /dev/ttyACM0 creation, the lxc-device man page
>>>> >> > says that it creates devices based on their existing entries on
>>>> >> > the host. Double-check that the host (dispatcher) has an
>>>> >> > appropriate /dev/ttyACM0 if you're still seeing problems?
>>>> >>
>>>> >> Steve was using staging-panda03 with the ARM Energy Probe which I'd
>>>> >> been using for the tests of the new code to ensure
>>>> >> that /dev/ttyACM0 can be attached to the LXC.
>>>> >>
>>>> >> That panda and AEP will shortly return to staging and then the
>>>> >> changes to LAVA and the required changes to the test definition
>>>> >> can be available for the 2017.6 release.
>>>> >
>>>> > OK. staging-panda03 is back and has been running tests. This is what
>>>> > we've learnt so far:
>>>> >
>>>> > 0: This does not appear to be an LXC issue. Running the commands
>>>> > manually on the worker with the same LXC on the same worker does
>>>> > return data from the probe.
>>>> >
>>>> > 1: Running the same commands in "headless" mode shows that the probe
>>>> > software starts successfully but something within the protocol
>>>> > parser or sampler fails to retrieve data.
>>>>
>>>>
>>>> What do you mean by headless mode?
>>>
>>> With no controlling terminal.
>>>
>>> LAVA runs as a daemon and forks processes to run the tests. This does
>>> not usually cause issues and is fundamental to automation. When I run
>>> the same commands in an LXC as a user logged into the machine, I get
>>> output. When I run the commands from a daemon, the output is not seen.
>>
>> even when you redirect the output to a file ?
>>
>> On workload automation, arm_probe is called in a dedicated process
>> with subprocess.Popen and we are able to get data in the file.
>> Just wonder what could be the difference in lava case
>>
>>>
>>>> >
>>>> > 2: The websockets dependency is completely unnecessary and has been
>>>> > disabled in the build I've been testing:
>>>> > https://git.linaro.org/lava-team/arm-probe.git/
>>>>
>>>>
>>>> Yes. I do the same. aepd is only useful for the web interface.
>>>>
>>>>
>>>> >
>>>> > 3: We've added a *lot* of debug to the arm-probe code
>>>> > (https://staging.validation.linaro.org/scheduler/job/174969 which
>>>> > was run using
>>>> > https://git.linaro.org/lava-team/arm-probe.git/commit/?id=9b
>>>> 2958e3045da77d7db25a7cfe48359211aa4cf1)
>>>> > but are not much closer to identifying the precise problem with the
>>>> > code. However, I am satisfied that this is a problem in the
>>>> > arm-probe software when being run in automation.
>>>>
>>>>
>>>> Can you give details about "this is a problem in arm probe software
>>>> when being run in automation"? Do you mean workload automation?
>>>
>>> No. Not workload automation - that is a specific test framework which
>>> can use LAVA. I'm talking about the process of running tests on behalf
>>> of users without the users being logged in or interacting with the
>>> shell.
>>
>> ok. Just to be sure about the context
>>
>>>
>>>> >
>>>> > 4: the arm-probe code is appallingly difficult to read and debug. It
>>>> > also seems unnecessarily complex.
>>>> >
>>>> > 5: I plan to remove a lot of the debug from the cloned arm-probe
>>>> > repository (which has also had a few fixes to compile with gcc6) but
>>>> > I'm running out of time to work on the arm-probe software myself.
>>>> >
>>>> > Someone needs to update the arm-probe software:
>>>> >
>>>> > a) to remove websockets as a compile-time option as this only bloats
>>>> > the build in automation where a web based UI is impossible anyway.
>>>> > I've done this by brute force in my cloned repo, I just patched out
>>>> > the dependency.
>>>> >
>>>> > b) improve the code to have comments and output about what is
>>>> > happening and why when verbose mode is used.
>>>> >
>>>> > c) Identify what is preventing the software from receiving data from
>>>> > the probe when run in automation.
>>>> >
>>>> > d) the config file still needs fixes to allow for changes in the
>>>> > device node name from one probe to another.
>>>> >
>>>> > --
>>>>
>>>> CC'ing Vincent, so he can read Neil's and Steve's comments above and
>>>> respond (if he has anything to say) while I'm on holiday until early
>>>> June.
>>>
>>> Steve & I are also on annual leave next week.
>>>
>>> --
>>>
>>>
>>> Neil Williams
>>> =============
>>> http://www.linux.codehelp.co.uk/
>>>
>
>
>
> --
>
> Neil Williams
> =============
> neil.willi...@linaro.org
> http://www.linux.codehelp.co.uk/
_______________________________________________
linaro-validation mailing list
linaro-validation@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/linaro-validation

Reply via email to