On 6 June 2017 at 16:24, Neil Williams <neil.willi...@linaro.org> wrote: > On 6 June 2017 at 14:32, Vincent Guittot <vincent.guit...@linaro.org> wrote: >> On 6 June 2017 at 14:25, Neil Williams <neil.willi...@linaro.org> wrote: >>> On 6 June 2017 at 13:11, Vincent Guittot <vincent.guit...@linaro.org> wrote: >>>> On 6 June 2017 at 14:03, Neil Williams <neil.willi...@linaro.org> wrote: >>>>> On 6 June 2017 at 12:53, Vincent Guittot <vincent.guit...@linaro.org> >>>>> wrote: >>>>>> On 6 June 2017 at 13:38, Neil Williams <neil.willi...@linaro.org> wrote: >>>>>>> This problem has been resolved inside the arm-probe configuration, it >>>>>>> is not a fault within LAVA. There was a concern that the probe was not >>>>>>> showing data output because of a theoretical problem of running >>>>>>> daemonized instead of with a controlling terminal. The actual problem >>>>>>> was that the probe software is running more slowly than expected and >>>>>>> extending the runtime of the utility allows the probe to output data. >>>>>>> https://staging.validation.linaro.org/scheduler/job/175033#L2038 >>>>>>> https://git.linaro.org/lava-team/refactoring.git/commit/?id=7916e6c3db5188e2c2e96da0b666a36ab3e8ffeb >>>>>> >>>>>> ok so the 2seconds for timeout was your problem >>>>> >>>>> That and the problem with the config file. >>>> >>>> ok >>>> >>>>> >>>>>>> (The verbose option was later dropped to output only the interesting >>>>>>> data.) >>>>>>> >>>>>>> The configuration file in the git repo needs to be modified. >>>>>>> >>>>>>> https://git.linaro.org/lava-team/refactoring.git/tree/testdefs/aep-config?id=e08f0bed2c3561421bc2f430ab2e38f1b659e2fd >>>>>> >>>>>> can you point out the modification you did that has been needed ? I >>>>>> can't see any obvious difference except using /dev/ttyACM0 instead of >>>>>> /dev/serial/by-id/usb-NXP_SEMICOND_ARM_Energy_Probe_S_NO44440001-if00. >>>>>> Is it the difference ? >>>>> >>>>> Yes, because inside the LXC, /dev/serial/by-id does not get created >>>>> (there is no udev support for that inside containers). >>>>> >>>>>> What about using 2 AEPs ? >>>>> >>>>> That would have to be fixed either in the test shell definitions (e.g. >>>>> using parameters passed through the test job) or within the arm-probe >>>>> code itself. I have no idea at this stage whether the arm-probe >>>>> software can cope with multiple probes - in LAVA that would likely >>>> >>>> arm-probe supports multi AEP and we are using with multi AEPs with the >>>> mtk8173 evb. >>>> arm-probe just rely of the config file to get the path of the AEP. I >>>> have put the content of the config file below: >>>> >>>> # arm-probe configuration file >>>> # >>>> # setup name >>>> mt8173-evb >>>> >>>> # <device path> >>>> /dev/serial/by-id/usb-NXP_SEMICOND_ARM_Energy_Probe_S_NO81730001-if00 >>>> VDD_CA57_0 0.500000 1 -0.179000 13.363000 -0.000000 0.163300 0 >>>> SoC/A57/Cache A57_CACHE #ff0000 SoC >>>> VDD_CA57_1 0.100000 2 -0.179000 13.363000 -0.000000 0.163300 0 >>>> SoC/A57/Core0 A57_CORE #ff0000 SoC >>>> VDD_CA57_2 0.100000 3 -0.179000 13.363000 -0.000000 0.163300 0 >>>> SoC/A57/Core1 A57_CORE #ff0000 SoC >>>> >>>> /dev/serial/by-id/usb-NXP_SEMICOND_ARM_Energy_Probe_S_NO81730000-if00 >>>> VDD_CA53_0 0.500000 1 -0.179000 13.363000 -0.000000 0.163300 0 >>>> SoC/A53/Cache A53_CACHE #ff0000 SoC >>>> VDD_CA53_1 0.100000 2 -0.179000 13.363000 -0.000000 0.163300 0 >>>> SoC/A53/Core0 A53_CORE #ff0000 SoC >>>> VDD_CA53_2 0.100000 3 -0.179000 13.363000 -0.000000 0.163300 0 >>>> SoC/A53/Core1 A53_CORE #ff0000 SoC >>> >>> These configuration files may need to be generated within the test >>> shell definition at runtime, based on parameters. The test shell will >>> need to work out which device is which probe and this could be awkward >>> without /dev/serial/by-id support. The enumeration order of ttyUSB0 >>> and ttyUSB1 cannot be guaranteed. dmesg remains available inside the >>> LXC, so some automated parsing may be required. If the arm-probe >> >> To be honest i don't like such way to proceed it is just error prone >> >>> software can be modified to use a more sane configuration file syntax, >>> this could also be addressed there. >> >> I don't catch why the config file is insane and how this will help for >> this problem > > If the config file is to be generated for each test job, the syntax is > awkward to handle as it would need a line inserted instead of > supporting a parser or similar. > >>>>> need secondary connections and MultiNode to separate the output. >>>> >>>> Is it something that Lisa can do by herself or does it need some >>>> changes from your side ? >>> >>> Secondary connections and MultiNode can be adopted by test writers >>> without any changes in LAVA. >>> >>> https://validation.linaro.org/static/docs/v2/dispatcher-design.html#index-4 >>> https://validation.linaro.org/static/docs/v2/pipeline-writer-secondary.html#index-0 >>> >>> Any testjob using MultiNode has a certain level of complexity, so the >>> change is non-trivial. >> >> Does it also mean that the datas of the 2 probes will not be in the >> same file whereas arm-probe already merge datas from multi AEP in its >> config file into one single output > > OK, then if that is what is desired then this can be done without > using secondary connections and therefore without MultiNode. I was
Great > expecting that the two would run simultaneously, causing issues with > interleaving. I haven't used more than 2 AEP simultenously but i remember andry green using 3 AEPs > > >>> Note also that physically fitting more AEPs will involve work by the >>> LAB team - especially for devices like the panda, because the power >>> connector which comes with the AEP does not fit the panda and a >>> one-off daughter board is required. >> >> This is something that has been already handled and in the case of the >> mt8173evb everything is already done and working on our server with >> current arm-probe, AEPs and workload automation > > > >> Regards, >> Vincent >>> >>> >>>> Regards, >>>> Vincent >>>> >>>>> >>>>> The syntax of the arm-probe configuration file does not make this easy >>>>> but that section could be patched to use a more sane structure. That >>>>> isn't related to the LAVA support though. >>>>> >>>>>>> >>>>>>> >>>>>>> On 29 May 2017 at 16:45, Vincent Guittot <vincent.guit...@linaro.org> >>>>>>> wrote: >>>>>>>> On 25 May 2017 at 10:03, Neil Williams <codeh...@debian.org> wrote: >>>>>>>>> On Wed, 24 May 2017 21:07:45 +0200 >>>>>>>>> Vincent Guittot <vincent.guit...@linaro.org> wrote: >>>>>>>>> >>>>>>>>>> Hi Neil, >>>>>>>>>> >>>>>>>>>> Le 24 mai 2017 7:42 PM, "Lisa Nguyen" <lisa.ngu...@linaro.org> a >>>>>>>>>> écrit : >>>>>>>>>> >>>>>>>>>> On 24 May 2017 at 17:02, Neil Williams <codeh...@debian.org> wrote: >>>>>>>>>> > On Fri, 19 May 2017 17:02:14 +0100 >>>>>>>>>> > Neil Williams <codeh...@debian.org> wrote: >>>>>>>>>> > >>>>>>>>>> >> On Fri, 19 May 2017 16:48:11 +0100 >>>>>>>>>> >> Steve McIntyre <steve.mcint...@linaro.org> wrote: >>>>>>>>>> >> >>>>>>>>>> >> > Hi folks! >>>>>>>>>> >> > >>>>>>>>>> >> > On Wed, May 17, 2017 at 03:05:41PM +0100, Neil Williams wrote: >>>>>>>>>> >> > >On Thu, 27 Apr 2017 08:19:19 +0100 >>>>>>>>>> >> > >Neil Williams <codeh...@debian.org> wrote: >>>>>>>>>> >> > > >>>>>>>>>> >> > >>>>>>>>>> >> > I've just run a local test with an AEP inside lxc on my local >>>>>>>>>> >> > machine. As far as I can see, there's nothing particularly magic >>>>>>>>>> >> > going on here. The only problem I see is Lisa's config file >>>>>>>>>> >> > pointing at the wrong device file. arm-probe needs a >>>>>>>>>> >> > ttyACM-style >>>>>>>>>> >> > device to talk to. Using: >>>>>>>>>> >> > >>>>>>>>>> >> > # lxc-device -n lxc-aep-test-174524 add /dev/ttyACM0 >>>>>>>>>> >> > >>>>>>>>>> >> > I create that device in my container. I build libwebsockets and >>>>>>>>>> >> > the arm-probe software in the container, then >>>>>>>>>> >> > specify /dev/ttyACM0 in the AEP config file. I can run it just >>>>>>>>>> >> > fine: >>>>>>>>>> >> > >>>>>>>>>> >> > root@lxc-aep-test-174524:/arm-probe# ./arm-probe/arm-probe -C >>>>>>>>>> >> > panda-aep.cfg -l10 -x # configuration: panda-aep.cfg >>>>>>>>>> >> > # config_name: pandaboard >>>>>>>>>> >> > # trigger: 0.400000V (hyst 0.200000V) 0.000000W (hyst 0.200000W) >>>>>>>>>> >> > 400us Configuration: pandaboard >>>>>>>>>> >> > # date: Fri, 19 May 2017 16:29:50 +0100 >>>>>>>>>> >> > # host: lxc-aep-test-174524 >>>>>>>>>> >> > # >>>>>>>>>> >> > + /dev/ttyACM0 >>>>>>>>>> >> > Starting... >>>>>>>>>> >> > sending start to 0 >>>>>>>>>> >> > # VDD_ALL VDD ROOT #ff0000 SoC >>>>>>>>>> >> > # >>>>>>>>>> >> > # >>>>>>>>>> >> > time VDD(V) VDD(A) VDD(W) >>>>>>>>>> >> > 0.000500 5.11 0.0474 0.24196 >>>>>>>>>> >> > 0.000600 5.11 0.0364 0.18572 >>>>>>>>>> >> > 0.000700 5.11 0.0314 0.16012 >>>>>>>>>> >> > 0.000800 5.10 0.0544 0.27734 >>>>>>>>>> >> > 0.000900 5.10 0.0234 0.11923 >>>>>>>>>> >> > 0.001000 5.11 0.0304 0.15505 >>>>>>>>>> >> > ... >>>>>>>>>> >> > >>>>>>>>>> >> > I don't have any problems running things and getting output >>>>>>>>>> >> > here. >>>>>>>>>> >> > >>>>>>>>>> >> > I *have* seen two real bugs here while trying to get things >>>>>>>>>> >> > running, though: >>>>>>>>>> >> > >>>>>>>>>> >> > 1. If the device specified in the config file doesn't exist, or >>>>>>>>>> >> > is the wrong type of device, or (maybe) there is any other kind >>>>>>>>>> >> > of problem with it, you get *no* useful feedback to say there's >>>>>>>>>> >> > a >>>>>>>>>> >> > problem. Running things under strace will show the >>>>>>>>>> >> > background >>>>>>>>>> >> > libarmep process attempt to use the device specified, but >>>>>>>>>> >> > there's no error handling. :-( >>>>>>>>>> >> > >>>>>>>>>> >> > 2. The "-x" option says that the arm-probe program is meant to >>>>>>>>>> >> > exit when you've done capturing, but it just sits there forever >>>>>>>>>> >> > when I'm testing. I've wrapped it using the "timeout" command to >>>>>>>>>> >> > work around that for now. >>>>>>>>>> >> > >>>>>>>>>> >> > If I knew where to file those bugs, I would, but it's really not >>>>>>>>>> >> > obvious. They're really easy to reproduce, I hope... >>>>>>>>>> >> > >>>>>>>>>> >> > In terms of the /dev/ttyACM0 creation, the lxc-device man page >>>>>>>>>> >> > says that it creates devices based on their existing entries on >>>>>>>>>> >> > the host. Double-check that the host (dispatcher) has an >>>>>>>>>> >> > appropriate /dev/ttyACM0 if you're still seeing problems? >>>>>>>>>> >> >>>>>>>>>> >> Steve was using staging-panda03 with the ARM Energy Probe which >>>>>>>>>> >> I'd >>>>>>>>>> >> been using for the tests of the new code to ensure >>>>>>>>>> >> that /dev/ttyACM0 can be attached to the LXC. >>>>>>>>>> >> >>>>>>>>>> >> That panda and AEP will shortly return to staging and then the >>>>>>>>>> >> changes to LAVA and the required changes to the test definition >>>>>>>>>> >> can be available for the 2017.6 release. >>>>>>>>>> > >>>>>>>>>> > OK. staging-panda03 is back and has been running tests. This is >>>>>>>>>> > what >>>>>>>>>> > we've learnt so far: >>>>>>>>>> > >>>>>>>>>> > 0: This does not appear to be an LXC issue. Running the commands >>>>>>>>>> > manually on the worker with the same LXC on the same worker does >>>>>>>>>> > return data from the probe. >>>>>>>>>> > >>>>>>>>>> > 1: Running the same commands in "headless" mode shows that the >>>>>>>>>> > probe >>>>>>>>>> > software starts successfully but something within the protocol >>>>>>>>>> > parser or sampler fails to retrieve data. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> What do you mean by headless mode? >>>>>>>>> >>>>>>>>> With no controlling terminal. >>>>>>>>> >>>>>>>>> LAVA runs as a daemon and forks processes to run the tests. This does >>>>>>>>> not usually cause issues and is fundamental to automation. When I run >>>>>>>>> the same commands in an LXC as a user logged into the machine, I get >>>>>>>>> output. When I run the commands from a daemon, the output is not seen. >>>>>>>> >>>>>>>> even when you redirect the output to a file ? >>>>>>>> >>>>>>>> On workload automation, arm_probe is called in a dedicated process >>>>>>>> with subprocess.Popen and we are able to get data in the file. >>>>>>>> Just wonder what could be the difference in lava case >>>>>>>> >>>>>>>>> >>>>>>>>>> > >>>>>>>>>> > 2: The websockets dependency is completely unnecessary and has been >>>>>>>>>> > disabled in the build I've been testing: >>>>>>>>>> > https://git.linaro.org/lava-team/arm-probe.git/ >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Yes. I do the same. aepd is only useful for the web interface. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> > >>>>>>>>>> > 3: We've added a *lot* of debug to the arm-probe code >>>>>>>>>> > (https://staging.validation.linaro.org/scheduler/job/174969 which >>>>>>>>>> > was run using >>>>>>>>>> > https://git.linaro.org/lava-team/arm-probe.git/commit/?id=9b >>>>>>>>>> 2958e3045da77d7db25a7cfe48359211aa4cf1) >>>>>>>>>> > but are not much closer to identifying the precise problem with the >>>>>>>>>> > code. However, I am satisfied that this is a problem in the >>>>>>>>>> > arm-probe software when being run in automation. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Can you give details about "this is a problem in arm probe software >>>>>>>>>> when being run in automation"? Do you mean workload automation? >>>>>>>>> >>>>>>>>> No. Not workload automation - that is a specific test framework which >>>>>>>>> can use LAVA. I'm talking about the process of running tests on behalf >>>>>>>>> of users without the users being logged in or interacting with the >>>>>>>>> shell. >>>>>>>> >>>>>>>> ok. Just to be sure about the context >>>>>>>> >>>>>>>>> >>>>>>>>>> > >>>>>>>>>> > 4: the arm-probe code is appallingly difficult to read and debug. >>>>>>>>>> > It >>>>>>>>>> > also seems unnecessarily complex. >>>>>>>>>> > >>>>>>>>>> > 5: I plan to remove a lot of the debug from the cloned arm-probe >>>>>>>>>> > repository (which has also had a few fixes to compile with gcc6) >>>>>>>>>> > but >>>>>>>>>> > I'm running out of time to work on the arm-probe software myself. >>>>>>>>>> > >>>>>>>>>> > Someone needs to update the arm-probe software: >>>>>>>>>> > >>>>>>>>>> > a) to remove websockets as a compile-time option as this only >>>>>>>>>> > bloats >>>>>>>>>> > the build in automation where a web based UI is impossible anyway. >>>>>>>>>> > I've done this by brute force in my cloned repo, I just patched out >>>>>>>>>> > the dependency. >>>>>>>>>> > >>>>>>>>>> > b) improve the code to have comments and output about what is >>>>>>>>>> > happening and why when verbose mode is used. >>>>>>>>>> > >>>>>>>>>> > c) Identify what is preventing the software from receiving data >>>>>>>>>> > from >>>>>>>>>> > the probe when run in automation. >>>>>>>>>> > >>>>>>>>>> > d) the config file still needs fixes to allow for changes in the >>>>>>>>>> > device node name from one probe to another. >>>>>>>>>> > >>>>>>>>>> > -- >>>>>>>>>> >>>>>>>>>> CC'ing Vincent, so he can read Neil's and Steve's comments above and >>>>>>>>>> respond (if he has anything to say) while I'm on holiday until early >>>>>>>>>> June. >>>>>>>>> >>>>>>>>> Steve & I are also on annual leave next week. >>>>>>>>> >>>>>>>>> -- >>>>>>>>> >>>>>>>>> >>>>>>>>> Neil Williams >>>>>>>>> ============= >>>>>>>>> http://www.linux.codehelp.co.uk/ >>>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> Neil Williams >>>>>>> ============= >>>>>>> neil.willi...@linaro.org >>>>>>> http://www.linux.codehelp.co.uk/ >>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> Neil Williams >>>>> ============= >>>>> neil.willi...@linaro.org >>>>> http://www.linux.codehelp.co.uk/ >>> >>> >>> >>> -- >>> >>> Neil Williams >>> ============= >>> neil.willi...@linaro.org >>> http://www.linux.codehelp.co.uk/ > > > > -- > > Neil Williams > ============= > neil.willi...@linaro.org > http://www.linux.codehelp.co.uk/ _______________________________________________ linaro-validation mailing list linaro-validation@lists.linaro.org https://lists.linaro.org/mailman/listinfo/linaro-validation