On Wed, 24 May 2017 21:07:45 +0200
Vincent Guittot <[email protected]> wrote:

> Hi Neil,
> 
> Le 24 mai 2017 7:42 PM, "Lisa Nguyen" <[email protected]> a
> écrit :
> 
> On 24 May 2017 at 17:02, Neil Williams <[email protected]> wrote:
> > On Fri, 19 May 2017 17:02:14 +0100
> > Neil Williams <[email protected]> wrote:
> >  
> >> On Fri, 19 May 2017 16:48:11 +0100
> >> Steve McIntyre <[email protected]> wrote:
> >>  
> >> > Hi folks!
> >> >
> >> > On Wed, May 17, 2017 at 03:05:41PM +0100, Neil Williams wrote:  
> >> > >On Thu, 27 Apr 2017 08:19:19 +0100
> >> > >Neil Williams <[email protected]> wrote:
> >> > >  
> >> >
> >> > I've just run a local test with an AEP inside lxc on my local
> >> > machine. As far as I can see, there's nothing particularly magic
> >> > going on here. The only problem I see is Lisa's config file
> >> > pointing at the wrong device file. arm-probe needs a ttyACM-style
> >> > device to talk to. Using:
> >> >
> >> > # lxc-device -n lxc-aep-test-174524 add /dev/ttyACM0
> >> >
> >> > I create that device in my container. I build libwebsockets and
> >> > the arm-probe software in the container, then
> >> > specify /dev/ttyACM0 in the AEP config file. I can run it just
> >> > fine:
> >> >
> >> > root@lxc-aep-test-174524:/arm-probe# ./arm-probe/arm-probe -C
> >> > panda-aep.cfg -l10 -x # configuration: panda-aep.cfg
> >> > # config_name: pandaboard
> >> > # trigger: 0.400000V (hyst 0.200000V) 0.000000W (hyst 0.200000W)
> >> > 400us Configuration: pandaboard
> >> > # date: Fri, 19 May 2017 16:29:50 +0100
> >> > # host: lxc-aep-test-174524
> >> > #
> >> > + /dev/ttyACM0
> >> > Starting...
> >> > sending start to 0
> >> > # VDD_ALL       VDD     ROOT    #ff0000 SoC
> >> > #
> >> > #
> >> > time  VDD(V) VDD(A) VDD(W)
> >> > 0.000500  5.11 0.0474 0.24196
> >> > 0.000600  5.11 0.0364 0.18572
> >> > 0.000700  5.11 0.0314 0.16012
> >> > 0.000800  5.10 0.0544 0.27734
> >> > 0.000900  5.10 0.0234 0.11923
> >> > 0.001000  5.11 0.0304 0.15505
> >> > ...
> >> >
> >> > I don't have any problems running things and getting output here.
> >> >
> >> > I *have* seen two real bugs here while trying to get things
> >> > running, though:
> >> >
> >> >  1. If the device specified in the config file doesn't exist, or
> >> > is the wrong type of device, or (maybe) there is any other kind
> >> > of problem with it, you get *no* useful feedback to say there's a
> >> >     problem. Running things under strace will show the background
> >> >     libarmep process attempt to use the device specified, but
> >> > there's no error handling. :-(
> >> >
> >> > 2. The "-x" option says that the arm-probe program is meant to
> >> > exit when you've done capturing, but it just sits there forever
> >> > when I'm testing. I've wrapped it using the "timeout" command to
> >> > work around that for now.
> >> >
> >> > If I knew where to file those bugs, I would, but it's really not
> >> > obvious. They're really easy to reproduce, I hope...
> >> >
> >> > In terms of the /dev/ttyACM0 creation, the lxc-device man page
> >> > says that it creates devices based on their existing entries on
> >> > the host. Double-check that the host (dispatcher) has an
> >> > appropriate /dev/ttyACM0 if you're still seeing problems?  
> >>
> >> Steve was using staging-panda03 with the ARM Energy Probe which I'd
> >> been using for the tests of the new code to ensure
> >> that /dev/ttyACM0 can be attached to the LXC.
> >>
> >> That panda and AEP will shortly return to staging and then the
> >> changes to LAVA and the required changes to the test definition
> >> can be available for the 2017.6 release.  
> >
> > OK. staging-panda03 is back and has been running tests. This is what
> > we've learnt so far:
> >
> > 0: This does not appear to be an LXC issue. Running the commands
> > manually on the worker with the same LXC on the same worker does
> > return data from the probe.
> >
> > 1: Running the same commands in "headless" mode shows that the probe
> > software starts successfully but something within the protocol
> > parser or sampler fails to retrieve data.  
> 
> 
> What do you mean by headless mode?

With no controlling terminal.

LAVA runs as a daemon and forks processes to run the tests. This does
not usually cause issues and is fundamental to automation. When I run
the same commands in an LXC as a user logged into the machine, I get
output. When I run the commands from a daemon, the output is not seen.

> >
> > 2: The websockets dependency is completely unnecessary and has been
> > disabled in the build I've been testing:
> > https://git.linaro.org/lava-team/arm-probe.git/  
> 
> 
> Yes. I do the same. aepd is only useful for the web interface.
> 
> 
> >
> > 3: We've added a *lot* of debug to the arm-probe code
> > (https://staging.validation.linaro.org/scheduler/job/174969 which
> > was run using
> > https://git.linaro.org/lava-team/arm-probe.git/commit/?id=9b  
> 2958e3045da77d7db25a7cfe48359211aa4cf1)
> > but are not much closer to identifying the precise problem with the
> > code. However, I am satisfied that this is a problem in the
> > arm-probe software when being run in automation.  
> 
> 
> Can you give details about "this is a problem in arm probe software
> when being run in automation"? Do you mean workload automation?

No. Not workload automation - that is a specific test framework which
can use LAVA. I'm talking about the process of running tests on behalf
of users without the users being logged in or interacting with the
shell.

> >
> > 4: the arm-probe code is appallingly difficult to read and debug. It
> > also seems unnecessarily complex.
> >
> > 5: I plan to remove a lot of the debug from the cloned arm-probe
> > repository (which has also had a few fixes to compile with gcc6) but
> > I'm running out of time to work on the arm-probe software myself.
> >
> > Someone needs to update the arm-probe software:
> >
> > a) to remove websockets as a compile-time option as this only bloats
> > the build in automation where a web based UI is impossible anyway.
> > I've done this by brute force in my cloned repo, I just patched out
> > the dependency.
> >
> > b) improve the code to have comments and output about what is
> > happening and why when verbose mode is used.
> >
> > c) Identify what is preventing the software from receiving data from
> > the probe when run in automation.
> >
> > d) the config file still needs fixes to allow for changes in the
> > device node name from one probe to another.
> >
> > --  
> 
> CC'ing Vincent, so he can read Neil's and Steve's comments above and
> respond (if he has anything to say) while I'm on holiday until early
> June.

Steve & I are also on annual leave next week.

-- 


Neil Williams
=============
http://www.linux.codehelp.co.uk/

Attachment: pgpIuYeBUhgT6.pgp
Description: OpenPGP digital signature

_______________________________________________
linaro-validation mailing list
[email protected]
https://lists.linaro.org/mailman/listinfo/linaro-validation

Reply via email to