On 6 June 2017 at 12:53, Vincent Guittot <[email protected]> wrote:
> On 6 June 2017 at 13:38, Neil Williams <[email protected]> wrote:
>> This problem has been resolved inside the arm-probe configuration, it
>> is not a fault within LAVA. There was a concern that the probe was not
>> showing data output because of a theoretical problem of running
>> daemonized instead of with a controlling terminal. The actual problem
>> was that the probe software is running more slowly than expected and
>> extending the runtime of the utility allows the probe to output data.
>> https://staging.validation.linaro.org/scheduler/job/175033#L2038
>> https://git.linaro.org/lava-team/refactoring.git/commit/?id=7916e6c3db5188e2c2e96da0b666a36ab3e8ffeb
>
> ok so the 2seconds for timeout was your problem

That and the problem with the config file.

>> (The verbose option was later dropped to output only the interesting data.)
>>
>> The configuration file in the git repo needs to be modified.
>>
>> https://git.linaro.org/lava-team/refactoring.git/tree/testdefs/aep-config?id=e08f0bed2c3561421bc2f430ab2e38f1b659e2fd
>
> can you point out the modification you did that has been needed ? I
> can't see any obvious difference except using /dev/ttyACM0 instead of
> /dev/serial/by-id/usb-NXP_SEMICOND_ARM_Energy_Probe_S_NO44440001-if00.
> Is it the difference ?

Yes, because inside the LXC, /dev/serial/by-id does not get created
(there is no udev support for that inside containers).

> What about using 2 AEPs ?

That would have to be fixed either in the test shell definitions (e.g.
using parameters passed through the test job) or within the arm-probe
code itself. I have no idea at this stage whether the arm-probe
software can cope with multiple probes - in LAVA that would likely
need secondary connections and MultiNode to separate the output.

The syntax of the arm-probe configuration file does not make this easy
but that section could be patched to use a more sane structure. That
isn't related to the LAVA support though.

>>
>>
>> On 29 May 2017 at 16:45, Vincent Guittot <[email protected]> wrote:
>>> On 25 May 2017 at 10:03, Neil Williams <[email protected]> wrote:
>>>> On Wed, 24 May 2017 21:07:45 +0200
>>>> Vincent Guittot <[email protected]> wrote:
>>>>
>>>>> Hi Neil,
>>>>>
>>>>> Le 24 mai 2017 7:42 PM, "Lisa Nguyen" <[email protected]> a
>>>>> écrit :
>>>>>
>>>>> On 24 May 2017 at 17:02, Neil Williams <[email protected]> wrote:
>>>>> > On Fri, 19 May 2017 17:02:14 +0100
>>>>> > Neil Williams <[email protected]> wrote:
>>>>> >
>>>>> >> On Fri, 19 May 2017 16:48:11 +0100
>>>>> >> Steve McIntyre <[email protected]> wrote:
>>>>> >>
>>>>> >> > Hi folks!
>>>>> >> >
>>>>> >> > On Wed, May 17, 2017 at 03:05:41PM +0100, Neil Williams wrote:
>>>>> >> > >On Thu, 27 Apr 2017 08:19:19 +0100
>>>>> >> > >Neil Williams <[email protected]> wrote:
>>>>> >> > >
>>>>> >> >
>>>>> >> > I've just run a local test with an AEP inside lxc on my local
>>>>> >> > machine. As far as I can see, there's nothing particularly magic
>>>>> >> > going on here. The only problem I see is Lisa's config file
>>>>> >> > pointing at the wrong device file. arm-probe needs a ttyACM-style
>>>>> >> > device to talk to. Using:
>>>>> >> >
>>>>> >> > # lxc-device -n lxc-aep-test-174524 add /dev/ttyACM0
>>>>> >> >
>>>>> >> > I create that device in my container. I build libwebsockets and
>>>>> >> > the arm-probe software in the container, then
>>>>> >> > specify /dev/ttyACM0 in the AEP config file. I can run it just
>>>>> >> > fine:
>>>>> >> >
>>>>> >> > root@lxc-aep-test-174524:/arm-probe# ./arm-probe/arm-probe -C
>>>>> >> > panda-aep.cfg -l10 -x # configuration: panda-aep.cfg
>>>>> >> > # config_name: pandaboard
>>>>> >> > # trigger: 0.400000V (hyst 0.200000V) 0.000000W (hyst 0.200000W)
>>>>> >> > 400us Configuration: pandaboard
>>>>> >> > # date: Fri, 19 May 2017 16:29:50 +0100
>>>>> >> > # host: lxc-aep-test-174524
>>>>> >> > #
>>>>> >> > + /dev/ttyACM0
>>>>> >> > Starting...
>>>>> >> > sending start to 0
>>>>> >> > # VDD_ALL       VDD     ROOT    #ff0000 SoC
>>>>> >> > #
>>>>> >> > #
>>>>> >> > time  VDD(V) VDD(A) VDD(W)
>>>>> >> > 0.000500  5.11 0.0474 0.24196
>>>>> >> > 0.000600  5.11 0.0364 0.18572
>>>>> >> > 0.000700  5.11 0.0314 0.16012
>>>>> >> > 0.000800  5.10 0.0544 0.27734
>>>>> >> > 0.000900  5.10 0.0234 0.11923
>>>>> >> > 0.001000  5.11 0.0304 0.15505
>>>>> >> > ...
>>>>> >> >
>>>>> >> > I don't have any problems running things and getting output here.
>>>>> >> >
>>>>> >> > I *have* seen two real bugs here while trying to get things
>>>>> >> > running, though:
>>>>> >> >
>>>>> >> >  1. If the device specified in the config file doesn't exist, or
>>>>> >> > is the wrong type of device, or (maybe) there is any other kind
>>>>> >> > of problem with it, you get *no* useful feedback to say there's a
>>>>> >> >     problem. Running things under strace will show the background
>>>>> >> >     libarmep process attempt to use the device specified, but
>>>>> >> > there's no error handling. :-(
>>>>> >> >
>>>>> >> > 2. The "-x" option says that the arm-probe program is meant to
>>>>> >> > exit when you've done capturing, but it just sits there forever
>>>>> >> > when I'm testing. I've wrapped it using the "timeout" command to
>>>>> >> > work around that for now.
>>>>> >> >
>>>>> >> > If I knew where to file those bugs, I would, but it's really not
>>>>> >> > obvious. They're really easy to reproduce, I hope...
>>>>> >> >
>>>>> >> > In terms of the /dev/ttyACM0 creation, the lxc-device man page
>>>>> >> > says that it creates devices based on their existing entries on
>>>>> >> > the host. Double-check that the host (dispatcher) has an
>>>>> >> > appropriate /dev/ttyACM0 if you're still seeing problems?
>>>>> >>
>>>>> >> Steve was using staging-panda03 with the ARM Energy Probe which I'd
>>>>> >> been using for the tests of the new code to ensure
>>>>> >> that /dev/ttyACM0 can be attached to the LXC.
>>>>> >>
>>>>> >> That panda and AEP will shortly return to staging and then the
>>>>> >> changes to LAVA and the required changes to the test definition
>>>>> >> can be available for the 2017.6 release.
>>>>> >
>>>>> > OK. staging-panda03 is back and has been running tests. This is what
>>>>> > we've learnt so far:
>>>>> >
>>>>> > 0: This does not appear to be an LXC issue. Running the commands
>>>>> > manually on the worker with the same LXC on the same worker does
>>>>> > return data from the probe.
>>>>> >
>>>>> > 1: Running the same commands in "headless" mode shows that the probe
>>>>> > software starts successfully but something within the protocol
>>>>> > parser or sampler fails to retrieve data.
>>>>>
>>>>>
>>>>> What do you mean by headless mode?
>>>>
>>>> With no controlling terminal.
>>>>
>>>> LAVA runs as a daemon and forks processes to run the tests. This does
>>>> not usually cause issues and is fundamental to automation. When I run
>>>> the same commands in an LXC as a user logged into the machine, I get
>>>> output. When I run the commands from a daemon, the output is not seen.
>>>
>>> even when you redirect the output to a file ?
>>>
>>> On workload automation, arm_probe is called in a dedicated process
>>> with subprocess.Popen and we are able to get data in the file.
>>> Just wonder what could be the difference in lava case
>>>
>>>>
>>>>> >
>>>>> > 2: The websockets dependency is completely unnecessary and has been
>>>>> > disabled in the build I've been testing:
>>>>> > https://git.linaro.org/lava-team/arm-probe.git/
>>>>>
>>>>>
>>>>> Yes. I do the same. aepd is only useful for the web interface.
>>>>>
>>>>>
>>>>> >
>>>>> > 3: We've added a *lot* of debug to the arm-probe code
>>>>> > (https://staging.validation.linaro.org/scheduler/job/174969 which
>>>>> > was run using
>>>>> > https://git.linaro.org/lava-team/arm-probe.git/commit/?id=9b
>>>>> 2958e3045da77d7db25a7cfe48359211aa4cf1)
>>>>> > but are not much closer to identifying the precise problem with the
>>>>> > code. However, I am satisfied that this is a problem in the
>>>>> > arm-probe software when being run in automation.
>>>>>
>>>>>
>>>>> Can you give details about "this is a problem in arm probe software
>>>>> when being run in automation"? Do you mean workload automation?
>>>>
>>>> No. Not workload automation - that is a specific test framework which
>>>> can use LAVA. I'm talking about the process of running tests on behalf
>>>> of users without the users being logged in or interacting with the
>>>> shell.
>>>
>>> ok. Just to be sure about the context
>>>
>>>>
>>>>> >
>>>>> > 4: the arm-probe code is appallingly difficult to read and debug. It
>>>>> > also seems unnecessarily complex.
>>>>> >
>>>>> > 5: I plan to remove a lot of the debug from the cloned arm-probe
>>>>> > repository (which has also had a few fixes to compile with gcc6) but
>>>>> > I'm running out of time to work on the arm-probe software myself.
>>>>> >
>>>>> > Someone needs to update the arm-probe software:
>>>>> >
>>>>> > a) to remove websockets as a compile-time option as this only bloats
>>>>> > the build in automation where a web based UI is impossible anyway.
>>>>> > I've done this by brute force in my cloned repo, I just patched out
>>>>> > the dependency.
>>>>> >
>>>>> > b) improve the code to have comments and output about what is
>>>>> > happening and why when verbose mode is used.
>>>>> >
>>>>> > c) Identify what is preventing the software from receiving data from
>>>>> > the probe when run in automation.
>>>>> >
>>>>> > d) the config file still needs fixes to allow for changes in the
>>>>> > device node name from one probe to another.
>>>>> >
>>>>> > --
>>>>>
>>>>> CC'ing Vincent, so he can read Neil's and Steve's comments above and
>>>>> respond (if he has anything to say) while I'm on holiday until early
>>>>> June.
>>>>
>>>> Steve & I are also on annual leave next week.
>>>>
>>>> --
>>>>
>>>>
>>>> Neil Williams
>>>> =============
>>>> http://www.linux.codehelp.co.uk/
>>>>
>>
>>
>>
>> --
>>
>> Neil Williams
>> =============
>> [email protected]
>> http://www.linux.codehelp.co.uk/



-- 

Neil Williams
=============
[email protected]
http://www.linux.codehelp.co.uk/
_______________________________________________
linaro-validation mailing list
[email protected]
https://lists.linaro.org/mailman/listinfo/linaro-validation

Reply via email to