On 6 June 2017 at 16:24, Neil Williams <neil.willi...@linaro.org> wrote:
> On 6 June 2017 at 14:32, Vincent Guittot <vincent.guit...@linaro.org> wrote:
>> On 6 June 2017 at 14:25, Neil Williams <neil.willi...@linaro.org> wrote:
>>> On 6 June 2017 at 13:11, Vincent Guittot <vincent.guit...@linaro.org> wrote:
>>>> On 6 June 2017 at 14:03, Neil Williams <neil.willi...@linaro.org> wrote:
>>>>> On 6 June 2017 at 12:53, Vincent Guittot <vincent.guit...@linaro.org> 
>>>>> wrote:
>>>>>> On 6 June 2017 at 13:38, Neil Williams <neil.willi...@linaro.org> wrote:
>>>>>>> This problem has been resolved inside the arm-probe configuration, it
>>>>>>> is not a fault within LAVA. There was a concern that the probe was not
>>>>>>> showing data output because of a theoretical problem of running
>>>>>>> daemonized instead of with a controlling terminal. The actual problem
>>>>>>> was that the probe software is running more slowly than expected and
>>>>>>> extending the runtime of the utility allows the probe to output data.
>>>>>>> https://staging.validation.linaro.org/scheduler/job/175033#L2038
>>>>>>> https://git.linaro.org/lava-team/refactoring.git/commit/?id=7916e6c3db5188e2c2e96da0b666a36ab3e8ffeb
>>>>>>
>>>>>> ok so the 2seconds for timeout was your problem
>>>>>
>>>>> That and the problem with the config file.
>>>>
>>>> ok
>>>>
>>>>>
>>>>>>> (The verbose option was later dropped to output only the interesting 
>>>>>>> data.)
>>>>>>>
>>>>>>> The configuration file in the git repo needs to be modified.
>>>>>>>
>>>>>>> https://git.linaro.org/lava-team/refactoring.git/tree/testdefs/aep-config?id=e08f0bed2c3561421bc2f430ab2e38f1b659e2fd
>>>>>>
>>>>>> can you point out the modification you did that has been needed ? I
>>>>>> can't see any obvious difference except using /dev/ttyACM0 instead of
>>>>>> /dev/serial/by-id/usb-NXP_SEMICOND_ARM_Energy_Probe_S_NO44440001-if00.
>>>>>> Is it the difference ?
>>>>>
>>>>> Yes, because inside the LXC, /dev/serial/by-id does not get created
>>>>> (there is no udev support for that inside containers).
>>>>>
>>>>>> What about using 2 AEPs ?
>>>>>
>>>>> That would have to be fixed either in the test shell definitions (e.g.
>>>>> using parameters passed through the test job) or within the arm-probe
>>>>> code itself. I have no idea at this stage whether the arm-probe
>>>>> software can cope with multiple probes - in LAVA that would likely
>>>>
>>>> arm-probe supports multi AEP and we are using with multi AEPs with the
>>>> mtk8173 evb.
>>>> arm-probe just rely of the config file to get the path of the AEP. I
>>>> have put the content of the config file below:
>>>>
>>>> # arm-probe configuration file
>>>> #
>>>> # setup name
>>>> mt8173-evb
>>>>
>>>> # <device path>
>>>> /dev/serial/by-id/usb-NXP_SEMICOND_ARM_Energy_Probe_S_NO81730001-if00
>>>>  VDD_CA57_0 0.500000 1 -0.179000 13.363000 -0.000000 0.163300 0
>>>> SoC/A57/Cache A57_CACHE #ff0000 SoC
>>>>  VDD_CA57_1 0.100000 2 -0.179000 13.363000 -0.000000 0.163300 0
>>>> SoC/A57/Core0 A57_CORE #ff0000 SoC
>>>>  VDD_CA57_2 0.100000 3 -0.179000 13.363000 -0.000000 0.163300 0
>>>> SoC/A57/Core1 A57_CORE #ff0000 SoC
>>>>
>>>> /dev/serial/by-id/usb-NXP_SEMICOND_ARM_Energy_Probe_S_NO81730000-if00
>>>>  VDD_CA53_0 0.500000 1 -0.179000 13.363000 -0.000000 0.163300 0
>>>> SoC/A53/Cache A53_CACHE #ff0000 SoC
>>>>  VDD_CA53_1 0.100000 2 -0.179000 13.363000 -0.000000 0.163300 0
>>>> SoC/A53/Core0 A53_CORE #ff0000 SoC
>>>>  VDD_CA53_2 0.100000 3 -0.179000 13.363000 -0.000000 0.163300 0
>>>> SoC/A53/Core1 A53_CORE #ff0000 SoC
>>>
>>> These configuration files may need to be generated within the test
>>> shell definition at runtime, based on parameters. The test shell will
>>> need to work out which device is which probe and this could be awkward
>>> without /dev/serial/by-id support. The enumeration order of ttyUSB0
>>> and ttyUSB1 cannot be guaranteed. dmesg remains available inside the
>>> LXC, so some automated parsing may be required. If the arm-probe
>>
>> To be honest i don't like such way to proceed it is just error prone
>>
>>> software can be modified to use a more sane configuration file syntax,
>>> this could also be addressed there.
>>
>> I don't catch why the config file is insane and how this will help for
>> this problem
>
> If the config file is to be generated for each test job, the syntax is
> awkward to handle as it would need a line inserted instead of
> supporting a parser or similar.
>
>>>>> need secondary connections and MultiNode to separate the output.
>>>>
>>>> Is it something that Lisa can do by herself or does it need some
>>>> changes from your side ?
>>>
>>> Secondary connections and MultiNode can be adopted by test writers
>>> without any changes in LAVA.
>>>
>>> https://validation.linaro.org/static/docs/v2/dispatcher-design.html#index-4
>>> https://validation.linaro.org/static/docs/v2/pipeline-writer-secondary.html#index-0
>>>
>>> Any testjob using MultiNode has a certain level of complexity, so the
>>> change is non-trivial.
>>
>> Does it also mean that the datas of the 2 probes will not be in the
>> same file whereas arm-probe already merge datas from multi AEP in its
>> config file into one single output
>
> OK, then if that is what is desired then this can be done without
> using secondary connections and therefore without MultiNode. I was

Great

> expecting that the two would run simultaneously, causing issues with
> interleaving.

I haven't used more than 2 AEP simultenously but i remember andry
green using 3 AEPs

>
>
>>> Note also that physically fitting more AEPs will involve work by the
>>> LAB team - especially for devices like the panda, because the power
>>> connector which comes with the AEP does not fit the panda and a
>>> one-off daughter board is required.
>>
>> This is something that has been already handled and in the case of the
>> mt8173evb everything is already done and working on our server with
>> current arm-probe, AEPs and workload automation
>
>
>
>> Regards,
>> Vincent
>>>
>>>
>>>> Regards,
>>>> Vincent
>>>>
>>>>>
>>>>> The syntax of the arm-probe configuration file does not make this easy
>>>>> but that section could be patched to use a more sane structure. That
>>>>> isn't related to the LAVA support though.
>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 29 May 2017 at 16:45, Vincent Guittot <vincent.guit...@linaro.org> 
>>>>>>> wrote:
>>>>>>>> On 25 May 2017 at 10:03, Neil Williams <codeh...@debian.org> wrote:
>>>>>>>>> On Wed, 24 May 2017 21:07:45 +0200
>>>>>>>>> Vincent Guittot <vincent.guit...@linaro.org> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Neil,
>>>>>>>>>>
>>>>>>>>>> Le 24 mai 2017 7:42 PM, "Lisa Nguyen" <lisa.ngu...@linaro.org> a
>>>>>>>>>> écrit :
>>>>>>>>>>
>>>>>>>>>> On 24 May 2017 at 17:02, Neil Williams <codeh...@debian.org> wrote:
>>>>>>>>>> > On Fri, 19 May 2017 17:02:14 +0100
>>>>>>>>>> > Neil Williams <codeh...@debian.org> wrote:
>>>>>>>>>> >
>>>>>>>>>> >> On Fri, 19 May 2017 16:48:11 +0100
>>>>>>>>>> >> Steve McIntyre <steve.mcint...@linaro.org> wrote:
>>>>>>>>>> >>
>>>>>>>>>> >> > Hi folks!
>>>>>>>>>> >> >
>>>>>>>>>> >> > On Wed, May 17, 2017 at 03:05:41PM +0100, Neil Williams wrote:
>>>>>>>>>> >> > >On Thu, 27 Apr 2017 08:19:19 +0100
>>>>>>>>>> >> > >Neil Williams <codeh...@debian.org> wrote:
>>>>>>>>>> >> > >
>>>>>>>>>> >> >
>>>>>>>>>> >> > I've just run a local test with an AEP inside lxc on my local
>>>>>>>>>> >> > machine. As far as I can see, there's nothing particularly magic
>>>>>>>>>> >> > going on here. The only problem I see is Lisa's config file
>>>>>>>>>> >> > pointing at the wrong device file. arm-probe needs a 
>>>>>>>>>> >> > ttyACM-style
>>>>>>>>>> >> > device to talk to. Using:
>>>>>>>>>> >> >
>>>>>>>>>> >> > # lxc-device -n lxc-aep-test-174524 add /dev/ttyACM0
>>>>>>>>>> >> >
>>>>>>>>>> >> > I create that device in my container. I build libwebsockets and
>>>>>>>>>> >> > the arm-probe software in the container, then
>>>>>>>>>> >> > specify /dev/ttyACM0 in the AEP config file. I can run it just
>>>>>>>>>> >> > fine:
>>>>>>>>>> >> >
>>>>>>>>>> >> > root@lxc-aep-test-174524:/arm-probe# ./arm-probe/arm-probe -C
>>>>>>>>>> >> > panda-aep.cfg -l10 -x # configuration: panda-aep.cfg
>>>>>>>>>> >> > # config_name: pandaboard
>>>>>>>>>> >> > # trigger: 0.400000V (hyst 0.200000V) 0.000000W (hyst 0.200000W)
>>>>>>>>>> >> > 400us Configuration: pandaboard
>>>>>>>>>> >> > # date: Fri, 19 May 2017 16:29:50 +0100
>>>>>>>>>> >> > # host: lxc-aep-test-174524
>>>>>>>>>> >> > #
>>>>>>>>>> >> > + /dev/ttyACM0
>>>>>>>>>> >> > Starting...
>>>>>>>>>> >> > sending start to 0
>>>>>>>>>> >> > # VDD_ALL       VDD     ROOT    #ff0000 SoC
>>>>>>>>>> >> > #
>>>>>>>>>> >> > #
>>>>>>>>>> >> > time  VDD(V) VDD(A) VDD(W)
>>>>>>>>>> >> > 0.000500  5.11 0.0474 0.24196
>>>>>>>>>> >> > 0.000600  5.11 0.0364 0.18572
>>>>>>>>>> >> > 0.000700  5.11 0.0314 0.16012
>>>>>>>>>> >> > 0.000800  5.10 0.0544 0.27734
>>>>>>>>>> >> > 0.000900  5.10 0.0234 0.11923
>>>>>>>>>> >> > 0.001000  5.11 0.0304 0.15505
>>>>>>>>>> >> > ...
>>>>>>>>>> >> >
>>>>>>>>>> >> > I don't have any problems running things and getting output 
>>>>>>>>>> >> > here.
>>>>>>>>>> >> >
>>>>>>>>>> >> > I *have* seen two real bugs here while trying to get things
>>>>>>>>>> >> > running, though:
>>>>>>>>>> >> >
>>>>>>>>>> >> >  1. If the device specified in the config file doesn't exist, or
>>>>>>>>>> >> > is the wrong type of device, or (maybe) there is any other kind
>>>>>>>>>> >> > of problem with it, you get *no* useful feedback to say there's 
>>>>>>>>>> >> > a
>>>>>>>>>> >> >     problem. Running things under strace will show the 
>>>>>>>>>> >> > background
>>>>>>>>>> >> >     libarmep process attempt to use the device specified, but
>>>>>>>>>> >> > there's no error handling. :-(
>>>>>>>>>> >> >
>>>>>>>>>> >> > 2. The "-x" option says that the arm-probe program is meant to
>>>>>>>>>> >> > exit when you've done capturing, but it just sits there forever
>>>>>>>>>> >> > when I'm testing. I've wrapped it using the "timeout" command to
>>>>>>>>>> >> > work around that for now.
>>>>>>>>>> >> >
>>>>>>>>>> >> > If I knew where to file those bugs, I would, but it's really not
>>>>>>>>>> >> > obvious. They're really easy to reproduce, I hope...
>>>>>>>>>> >> >
>>>>>>>>>> >> > In terms of the /dev/ttyACM0 creation, the lxc-device man page
>>>>>>>>>> >> > says that it creates devices based on their existing entries on
>>>>>>>>>> >> > the host. Double-check that the host (dispatcher) has an
>>>>>>>>>> >> > appropriate /dev/ttyACM0 if you're still seeing problems?
>>>>>>>>>> >>
>>>>>>>>>> >> Steve was using staging-panda03 with the ARM Energy Probe which 
>>>>>>>>>> >> I'd
>>>>>>>>>> >> been using for the tests of the new code to ensure
>>>>>>>>>> >> that /dev/ttyACM0 can be attached to the LXC.
>>>>>>>>>> >>
>>>>>>>>>> >> That panda and AEP will shortly return to staging and then the
>>>>>>>>>> >> changes to LAVA and the required changes to the test definition
>>>>>>>>>> >> can be available for the 2017.6 release.
>>>>>>>>>> >
>>>>>>>>>> > OK. staging-panda03 is back and has been running tests. This is 
>>>>>>>>>> > what
>>>>>>>>>> > we've learnt so far:
>>>>>>>>>> >
>>>>>>>>>> > 0: This does not appear to be an LXC issue. Running the commands
>>>>>>>>>> > manually on the worker with the same LXC on the same worker does
>>>>>>>>>> > return data from the probe.
>>>>>>>>>> >
>>>>>>>>>> > 1: Running the same commands in "headless" mode shows that the 
>>>>>>>>>> > probe
>>>>>>>>>> > software starts successfully but something within the protocol
>>>>>>>>>> > parser or sampler fails to retrieve data.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> What do you mean by headless mode?
>>>>>>>>>
>>>>>>>>> With no controlling terminal.
>>>>>>>>>
>>>>>>>>> LAVA runs as a daemon and forks processes to run the tests. This does
>>>>>>>>> not usually cause issues and is fundamental to automation. When I run
>>>>>>>>> the same commands in an LXC as a user logged into the machine, I get
>>>>>>>>> output. When I run the commands from a daemon, the output is not seen.
>>>>>>>>
>>>>>>>> even when you redirect the output to a file ?
>>>>>>>>
>>>>>>>> On workload automation, arm_probe is called in a dedicated process
>>>>>>>> with subprocess.Popen and we are able to get data in the file.
>>>>>>>> Just wonder what could be the difference in lava case
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> >
>>>>>>>>>> > 2: The websockets dependency is completely unnecessary and has been
>>>>>>>>>> > disabled in the build I've been testing:
>>>>>>>>>> > https://git.linaro.org/lava-team/arm-probe.git/
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Yes. I do the same. aepd is only useful for the web interface.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> >
>>>>>>>>>> > 3: We've added a *lot* of debug to the arm-probe code
>>>>>>>>>> > (https://staging.validation.linaro.org/scheduler/job/174969 which
>>>>>>>>>> > was run using
>>>>>>>>>> > https://git.linaro.org/lava-team/arm-probe.git/commit/?id=9b
>>>>>>>>>> 2958e3045da77d7db25a7cfe48359211aa4cf1)
>>>>>>>>>> > but are not much closer to identifying the precise problem with the
>>>>>>>>>> > code. However, I am satisfied that this is a problem in the
>>>>>>>>>> > arm-probe software when being run in automation.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Can you give details about "this is a problem in arm probe software
>>>>>>>>>> when being run in automation"? Do you mean workload automation?
>>>>>>>>>
>>>>>>>>> No. Not workload automation - that is a specific test framework which
>>>>>>>>> can use LAVA. I'm talking about the process of running tests on behalf
>>>>>>>>> of users without the users being logged in or interacting with the
>>>>>>>>> shell.
>>>>>>>>
>>>>>>>> ok. Just to be sure about the context
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> >
>>>>>>>>>> > 4: the arm-probe code is appallingly difficult to read and debug. 
>>>>>>>>>> > It
>>>>>>>>>> > also seems unnecessarily complex.
>>>>>>>>>> >
>>>>>>>>>> > 5: I plan to remove a lot of the debug from the cloned arm-probe
>>>>>>>>>> > repository (which has also had a few fixes to compile with gcc6) 
>>>>>>>>>> > but
>>>>>>>>>> > I'm running out of time to work on the arm-probe software myself.
>>>>>>>>>> >
>>>>>>>>>> > Someone needs to update the arm-probe software:
>>>>>>>>>> >
>>>>>>>>>> > a) to remove websockets as a compile-time option as this only 
>>>>>>>>>> > bloats
>>>>>>>>>> > the build in automation where a web based UI is impossible anyway.
>>>>>>>>>> > I've done this by brute force in my cloned repo, I just patched out
>>>>>>>>>> > the dependency.
>>>>>>>>>> >
>>>>>>>>>> > b) improve the code to have comments and output about what is
>>>>>>>>>> > happening and why when verbose mode is used.
>>>>>>>>>> >
>>>>>>>>>> > c) Identify what is preventing the software from receiving data 
>>>>>>>>>> > from
>>>>>>>>>> > the probe when run in automation.
>>>>>>>>>> >
>>>>>>>>>> > d) the config file still needs fixes to allow for changes in the
>>>>>>>>>> > device node name from one probe to another.
>>>>>>>>>> >
>>>>>>>>>> > --
>>>>>>>>>>
>>>>>>>>>> CC'ing Vincent, so he can read Neil's and Steve's comments above and
>>>>>>>>>> respond (if he has anything to say) while I'm on holiday until early
>>>>>>>>>> June.
>>>>>>>>>
>>>>>>>>> Steve & I are also on annual leave next week.
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Neil Williams
>>>>>>>>> =============
>>>>>>>>> http://www.linux.codehelp.co.uk/
>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> Neil Williams
>>>>>>> =============
>>>>>>> neil.willi...@linaro.org
>>>>>>> http://www.linux.codehelp.co.uk/
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Neil Williams
>>>>> =============
>>>>> neil.willi...@linaro.org
>>>>> http://www.linux.codehelp.co.uk/
>>>
>>>
>>>
>>> --
>>>
>>> Neil Williams
>>> =============
>>> neil.willi...@linaro.org
>>> http://www.linux.codehelp.co.uk/
>
>
>
> --
>
> Neil Williams
> =============
> neil.willi...@linaro.org
> http://www.linux.codehelp.co.uk/
_______________________________________________
linaro-validation mailing list
linaro-validation@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/linaro-validation

Reply via email to