On Sun, Jun 21, 2015 at 3:04 PM, Rohini Kulkarni <krohini1...@gmail.com> wrote: > I missed mentioning the number of dhrystones in the previous mail. > > Originally it was 1 million. > The new number of dhrystones I executed is 100 million. > The next thing to do is to figure out what changes are contributing to the performance improvement, and then prepare patches. :) Great work
> On Mon, Jun 22, 2015 at 12:29 AM, Rohini Kulkarni <krohini1...@gmail.com> > wrote: >> >> Hi all, >> >> I have managed to get a significant performance improvement with some >> changes in configurations. >> >> The measured time was for dhrystones reduced from 12 to "too small to be >> measured " >> >> For dhrystones the time was 0.4. >> >> The number of dhrystones per second increased from approximately 83333 to >> 2500000 :) >> >> Thanks! >> >> On Sun, Jun 21, 2015 at 1:32 AM, Rohini Kulkarni <krohini1...@gmail.com> >> wrote: >>> >>> Hi, >>> >>> I have added an SMP related post to my blog to define where exactly in >>> the code I need to work. Some feedback to indicate if I am identifying the >>> work area correctly would be very helpful! >>> >>> Thanks! >>> >>> On 18 Jun 2015 03:37, "Rohini Kulkarni" <krohini1...@gmail.com> wrote: >>>> >>>> Hi all, >>>> >>>> I have updated my blog to reflect my understanding and attempts for >>>> cache performance issue. >>>> >>>> Lately I have been trying around memory attributes for the >>>> mm_config_table. One set of configurations for cacheable memory (inner and >>>> outer levels)ended up reducing performance further ( which I really thought >>>> would improve). So this table set up certainly controls performance. >>>> >>>> The results are not improving after turning on cache. So memory sections >>>> are perhaps not even getting cached. >>>> I get a feeling it has got to do with this mm_config_table. >>>> >>>> Updates from the github code and blog might help in further discussion. >>>> >>>> Link to github code:https://github.com/krohini1593/rtems/tree/rohini >>>> >>>> Link to Blog >>>> >>>> Thanks! >>>> >>>> On Mon, Jun 15, 2015 at 8:29 PM, Alan Cudmore <alan.cudm...@gmail.com> >>>> wrote: >>>>> >>>>> Hi, >>>>> Some of the code examples may give you some clues. Like this one: >>>>> https://github.com/mrvn/test/blob/master/smp.cc >>>>> >>>>> Or this: >>>>> https://github.com/PeterLemon/RaspberryPi/tree/master/SMP/SMPINIT >>>>> >>>>> If you still can't figure it out, you can always join the >>>>> raspberrypi.org forums and ask on this thread: >>>>> https://www.raspberrypi.org/forums/viewtopic.php?f=72&t=98904 >>>>> >>>>> When it comes to the Pi 2 and SMP, you are our RTEMS expert :) >>>>> >>>>> Thanks, >>>>> Alan >>>>> >>>>> >>>>> On Sat, Jun 13, 2015 at 2:29 PM, Rohini Kulkarni >>>>> <krohini1...@gmail.com> wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> This is regarding Pi 2 SMP support. After powering on, the secondary >>>>>> mailboxes read one of their four mailbox registers and wait for a >>>>>> non-zero >>>>>> content to be written. This content is to be the physical address of the >>>>>> location from where the cores are expected to start execution. >>>>>> >>>>>> I am stuck at figuring out this address. How should I go about >>>>>> understanding this? >>>>>> >>>>>> Thanks! >>>>>> >>>>>> On 3 Jun 2015 19:44, "Gedare Bloom" <ged...@gwu.edu> wrote: >>>>>>> >>>>>>> On Wed, Jun 3, 2015 at 2:39 AM, Rohini Kulkarni >>>>>>> <krohini1...@gmail.com> wrote: >>>>>>> > But, I can't say cache configurations have a role here. >>>>>>> > >>>>>>> > I'll push my code to my github project soon. >>>>>>> > >>>>>>> > P.S. The Pi2 board I possess seems to have broken down. It just >>>>>>> > isn't >>>>>>> > turning on. Unable to test further. Will order one immediately. >>>>>>> > >>>>>>> Ouch. Make sure you put it in a safe space for development, clear of >>>>>>> threats like moisture, static shock, and cats. >>>>>>> >>>>>>> > On 3 Jun 2015 09:03, "Rohini Kulkarni" <krohini1...@gmail.com> >>>>>>> > wrote: >>>>>>> >> >>>>>>> >> Hi, >>>>>>> >> >>>>>>> >> Alan, your suggestion has resulted in much improvement >>>>>>> >> >>>>>>> >> arm_control=0x1000 >>>>>>> >> >>>>>>> >> This has simply worked! Looks like the other cores were taking up >>>>>>> >> plenty >>>>>>> >> of time. >>>>>>> >> I was aware from references that the other cores run a WFI, but >>>>>>> >> ya, did >>>>>>> >> not get its impact. >>>>>>> >> Time for each dhrystone has reduced to 7 from 13 and the no of >>>>>>> >> dhrystones >>>>>>> >> per second also increased. >>>>>>> >> >>>>>>> >> But this is a change only in the config.txt not actually in the >>>>>>> >> boot code. >>>>>>> >> >>>>>>> >> Thanks >>>>>>> >> >>>>>>> >> Rohini >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> On Wed, Jun 3, 2015 at 7:12 AM, Alan Cudmore >>>>>>> >> <alan.cudm...@gmail.com> >>>>>>> >> wrote: >>>>>>> >>> >>>>>>> >>> The caches are being enabled on the RPI 1 BSP. The same code is >>>>>>> >>> being >>>>>>> >>> executed by the RPI 2 BSP, but obviously it’s not sufficient for >>>>>>> >>> the cache >>>>>>> >>> setup. >>>>>>> >>> I have been reading through this long thread, and it is very >>>>>>> >>> informative: >>>>>>> >>> https://www.raspberrypi.org/forums/viewtopic.php?f=72&t=98904 >>>>>>> >>> >>>>>>> >>> I am starting to understand the setup that is required to enable >>>>>>> >>> caches >>>>>>> >>> on the RPI 2. For example this message near the bottom of page 3 >>>>>>> >>> gives a >>>>>>> >>> good indication of the speedup available by configuring the MMU >>>>>>> >>> and caches >>>>>>> >>> correctly: >>>>>>> >>> Quote from above thread >>>>>>> >>> ------------------------------ >>>>>>> >>> Enabling I/D caches and branch prediction, just like the julia >>>>>>> >>> demo uses, >>>>>>> >>> it takes ~12 seconds, or ~21 fps. It's just one core but also a >>>>>>> >>> much smaller >>>>>>> >>> loop than the julia demo has. >>>>>>> >>> >>>>>>> >>> Enabling the MMU and mapping memory inner/outer write-back, write >>>>>>> >>> allocate and the framebuffer inner write-through, no write >>>>>>> >>> allocate + outer >>>>>>> >>> write-back, write-allocate it takes ~8 seconds, of 32 fps. >>>>>>> >>> >>>>>>> >>> PS: 640x480x32 with MMU gets me ~256 fps. Must have a greater L2 >>>>>>> >>> cache >>>>>>> >>> effect. >>>>>>> >>> ------------------------- >>>>>>> >>> End of quote >>>>>>> >>> >>>>>>> >>> The person who posted the above comment (mrvn) posted the code >>>>>>> >>> here: >>>>>>> >>> https://github.com/mrvn/test/blob/master/mmu.cc >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> Also, it seems that when the Pi 2 starts up, cores 1-3 are put in >>>>>>> >>> a wait >>>>>>> >>> loop always accessing the bus. By putting this option in the >>>>>>> >>> config.txt file >>>>>>> >>> you can put the other cores to sleep, speeding up the code on >>>>>>> >>> core 1. >>>>>>> >>> arm_control=0x1000 >>>>>>> >>> It would be worth trying that option to see if the benchmark >>>>>>> >>> speeds up. >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> Alan >>>>>>> >>> >>>>>>> >>> On Jun 2, 2015, at 8:05 AM, Hesham ALMatary >>>>>>> >>> <heshamelmat...@gmail.com> >>>>>>> >>> wrote: >>>>>>> >>> >>>>>>> >>> On Tue, Jun 2, 2015 at 12:41 PM, Rohini Kulkarni >>>>>>> >>> <krohini1...@gmail.com> >>>>>>> >>> wrote: >>>>>>> >>> >>>>>>> >>> From what I saw, they have to be enabled separately. Cache/mmu >>>>>>> >>> are >>>>>>> >>> disabled >>>>>>> >>> upon reset. >>>>>>> >>> >>>>>>> >>> For the existing Raspberry BSP [1] there's a code for MMU/Cache >>>>>>> >>> init, >>>>>>> >>> however I don't know about Pi2 and where its code is. >>>>>>> >>> >>>>>>> >>> [1] >>>>>>> >>> >>>>>>> >>> https://github.com/RTEMS/rtems/tree/master/c/src/lib/libbsp/arm/raspberrypi >>>>>>> >>> >>>>>>> >>> On 2 Jun 2015 16:59, "Hesham ALMatary" <heshamelmat...@gmail.com> >>>>>>> >>> wrote: >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> Hi, >>>>>>> >>> >>>>>>> >>> Aren't the MMU/Caches enabled by default for RPi [1]? >>>>>>> >>> >>>>>>> >>> [1] >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> https://github.com/RTEMS/rtems/blob/master/c/src/lib/libbsp/arm/shared/mminit.c >>>>>>> >>> >>>>>>> >>> On Tue, Jun 2, 2015 at 12:18 PM, Joel Sherrill >>>>>>> >>> <joel.sherr...@oarcorp.com> wrote: >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> On June 2, 2015 7:01:21 AM EDT, Rohini Kulkarni >>>>>>> >>> <krohini1...@gmail.com> >>>>>>> >>> wrote: >>>>>>> >>> >>>>>>> >>> Dr. Joel, >>>>>>> >>> >>>>>>> >>> So we can't say something solely on the basis of this result? >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> I don't think so. If Linux performs the same, then what you did >>>>>>> >>> is as >>>>>>> >>> good as it gets. >>>>>>> >>> >>>>>>> >>> However, if Linux is faster then some setting still isn't right. >>>>>>> >>> >>>>>>> >>> You need a reference measurement to have any confidence. It is >>>>>>> >>> possible >>>>>>> >>> you did something but didn't actually turn the cache (or all the >>>>>>> >>> cache) >>>>>>> >>> on. >>>>>>> >>> >>>>>>> >>> On 2 Jun 2015 16:28, "Rohini Kulkarni" <krohini1...@gmail.com> >>>>>>> >>> wrote: >>>>>>> >>> >>>>>>> >>> I have not run it under linux on pi2 yet. Will have to run and >>>>>>> >>> check >>>>>>> >>> the result. >>>>>>> >>> >>>>>>> >>> On 2 Jun 2015 16:16, "Joel Sherrill" <joel.sherr...@oarcorp.com> >>>>>>> >>> wrote: >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> On June 2, 2015 5:58:33 AM EDT, Rohini Kulkarni >>>>>>> >>> <krohini1...@gmail.com> >>>>>>> >>> wrote: >>>>>>> >>> >>>>>>> >>> HI, >>>>>>> >>> >>>>>>> >>> I tried running the dhrystone benchmark with some changes for >>>>>>> >>> >>>>>>> >>> cache/mmu >>>>>>> >>> >>>>>>> >>> set up. >>>>>>> >>> >>>>>>> >>> However, the output shows a reduction in performance. >>>>>>> >>> The time to run through the dhrystone has increased from 12 to 13 >>>>>>> >>> and >>>>>>> >>> dhrystones run per second decreased. >>>>>>> >>> >>>>>>> >>> According to this result, things were better with caches >>>>>>> >>> disabled. >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> I have been working on this since two days and could not figure >>>>>>> >>> out an >>>>>>> >>> improvement. Any pointers? >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> How did it do under Linux on the Pi2? >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> Thanks. >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> On Thu, May 28, 2015 at 8:41 PM, Rohini Kulkarni >>>>>>> >>> <krohini1...@gmail.com> wrote: >>>>>>> >>> >>>>>>> >>> Hi All, >>>>>>> >>> >>>>>>> >>> I have to implement the cache coherency support for Cortex A7. >>>>>>> >>> But for >>>>>>> >>> A7 MPCore, unlike for A9, I am not able to find any register >>>>>>> >>> description for the Snoop Control Unit from the TRM. >>>>>>> >>> >>>>>>> >>> I need help here on how to proceed. >>>>>>> >>> >>>>>>> >>> Additionally for A9 there is a single bit for A9 in the Auxiliary >>>>>>> >>> Control Register which enables cache broadcast operations. The >>>>>>> >>> >>>>>>> >>> register >>>>>>> >>> >>>>>>> >>> format is different for A7 and again I am unable to find how to >>>>>>> >>> >>>>>>> >>> achieve >>>>>>> >>> >>>>>>> >>> the same for A7. >>>>>>> >>> >>>>>>> >>> Thanks! >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> On Tue, May 5, 2015 at 10:42 PM, Joel Sherrill >>>>>>> >>> <joel.sherr...@oarcorp.com> wrote: >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> On 5/5/2015 11:11 AM, Rohini Kulkarni wrote: >>>>>>> >>> >>>>>>> >>> Hi, >>>>>>> >>> >>>>>>> >>> I am working with the code for bsp hooks. I am referring to >>>>>>> >>> existing >>>>>>> >>> ARM multicore bsp codes, zync mainly. >>>>>>> >>> >>>>>>> >>> 1. There are existing hooks for the raspberry pi. Where should >>>>>>> >>> the >>>>>>> >>> >>>>>>> >>> code >>>>>>> >>> >>>>>>> >>> for the Pi2 hooks be added? >>>>>>> >>> >>>>>>> >>> The Pi and Pi2 are remarkably similar so Pi2 should be placed >>>>>>> >>> inside >>>>>>> >>> the Pi BSP directory. >>>>>>> >>> There is already a Pi2 variant of that code built. But we know >>>>>>> >>> >>>>>>> >>> specific >>>>>>> >>> >>>>>>> >>> places where there >>>>>>> >>> are variances. Depending on the scope of what is different, it >>>>>>> >>> can be >>>>>>> >>> as simple as >>>>>>> >>> a cpp conditional in a .h to select a value or two >>>>>>> >>> implementations of >>>>>>> >>> >>>>>>> >>> a >>>>>>> >>> >>>>>>> >>> single method >>>>>>> >>> and the Makefile.am picking the right file to build based on the >>>>>>> >>> board >>>>>>> >>> variant. >>>>>>> >>> >>>>>>> >>> The big question to always ask is: Is this specific to the Pi2 >>>>>>> >>> and >>>>>>> >>> incompatible with the Pi? >>>>>>> >>> >>>>>>> >>> Since the Pi BSP is still missing capabilities, it is likely code >>>>>>> >>> common to both will >>>>>>> >>> be added this summer. For example, did the mailbox interface >>>>>>> >>> change? I >>>>>>> >>> don't know >>>>>>> >>> but would guess that it didn't. Each new capability added needs >>>>>>> >>> that >>>>>>> >>> added. >>>>>>> >>> >>>>>>> >>> And any differences need to be analyzed to pick the least >>>>>>> >>> intrusive >>>>>>> >>> >>>>>>> >>> way >>>>>>> >>> >>>>>>> >>> to provide >>>>>>> >>> alternate implementations. Or enable special code like the Pi2 >>>>>>> >>> SMP >>>>>>> >>> support which >>>>>>> >>> is dependent on --enable-smp and being a Pi2. >>>>>>> >>> >>>>>>> >>> 2. Am I right in understanding that I will have to implement A7 >>>>>>> >>> specific functions as have been for A9? I am referring >>>>>>> >>> specifically to >>>>>>> >>> the arm-a9mpcore-start.h >>>>>>> >>> >>>>>>> >>> Yes. >>>>>>> >>> >>>>>>> >>> If the code is very similar between the a7 and a9, then a >>>>>>> >>> discussion >>>>>>> >>> on devel@ should occur to decide the best way to minimize >>>>>>> >>> duplication. >>>>>>> >>> >>>>>>> >>> If you end up with a7 specific code, you should follow the >>>>>>> >>> location >>>>>>> >>> >>>>>>> >>> and >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> naming patterns already established. That places it in >>>>>>> >>> libbsp/arm/shared/... >>>>>>> >>> so it can be used by any BSP with the right SMP core. >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> I am referring to existing codes to locate and get hold of what >>>>>>> >>> needs >>>>>>> >>> to be done in the hooks. However, being new to such >>>>>>> >>> implementations, I >>>>>>> >>> am taking longer to understand the details. Any suggestions that >>>>>>> >>> might >>>>>>> >>> help here are welcome >>>>>>> >>> >>>>>>> >>> The answer will depend on the factors listed above. When code can >>>>>>> >>> be shared, we want to share it across as many BSPs as makes >>>>>>> >>> sense. >>>>>>> >>> When it is unique to a specific BSP **variant** (e.g. Pi vs Pi2), >>>>>>> >>> then >>>>>>> >>> you want to find the way to account for the variation in the >>>>>>> >>> least >>>>>>> >>> intrusive code way possible. >>>>>>> >>> >>>>>>> >>> Thanks! >>>>>>> >>> >>>>>>> >>> On 1 May 2015 12:45, "Rohini Kulkarni" <krohini1...@gmail.com> >>>>>>> >>> wrote: >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> Hi, >>>>>>> >>> >>>>>>> >>> Excited to be a part of this edition of GSoC! Thanks to >>>>>>> >>> everybody for >>>>>>> >>> helping me get here and congratulations to all the participating >>>>>>> >>> students! >>>>>>> >>> >>>>>>> >>> So, now getting to work, firstly I wish to know, specifically >>>>>>> >>> from my >>>>>>> >>> mentors, any changes that must be made to my proposed project or >>>>>>> >>> schedule. >>>>>>> >>> >>>>>>> >>> Secondly, are there any specifics for the development blog that >>>>>>> >>> we >>>>>>> >>> >>>>>>> >>> need >>>>>>> >>> >>>>>>> >>> to create for the project? Over time what is the blog expected to >>>>>>> >>> convey. >>>>>>> >>> >>>>>>> >>> Also, I have to create a new wiki page for my project as none >>>>>>> >>> exists. >>>>>>> >>> >>>>>>> >>> I >>>>>>> >>> >>>>>>> >>> want to know how to add one. >>>>>>> >>> >>>>>>> >>> -- >>>>>>> >>> >>>>>>> >>> Rohini Kulkarni >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> -- Joel Sherrill, Ph.D. Director of Research & Development >>>>>>> >>> joel.sherr...@oarcorp.com On-Line Applications Research Ask me >>>>>>> >>> about >>>>>>> >>> RTEMS: a free RTOS Huntsville AL 35805 Support Available (256) >>>>>>> >>> >>>>>>> >>> 722-9985 >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> -- >>>>>>> >>> >>>>>>> >>> Rohini Kulkarni >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> -- >>>>>>> >>> >>>>>>> >>> Rohini Kulkarni >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> --joel >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> --joel >>>>>>> >>> _______________________________________________ >>>>>>> >>> devel mailing list >>>>>>> >>> devel@rtems.org >>>>>>> >>> http://lists.rtems.org/mailman/listinfo/devel >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> -- >>>>>>> >>> Hesham >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> -- >>>>>>> >>> Hesham >>>>>>> >>> _______________________________________________ >>>>>>> >>> devel mailing list >>>>>>> >>> devel@rtems.org >>>>>>> >>> http://lists.rtems.org/mailman/listinfo/devel >>>>>>> >>> >>>>>>> >>> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> -- >>>>>>> >> Rohini Kulkarni >>>>>>> > >>>>>>> > >>>>>>> > _______________________________________________ >>>>>>> > devel mailing list >>>>>>> > devel@rtems.org >>>>>>> > http://lists.rtems.org/mailman/listinfo/devel >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> devel mailing list >>>>>> devel@rtems.org >>>>>> http://lists.rtems.org/mailman/listinfo/devel >>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> Rohini Kulkarni >> >> >> >> >> -- >> Rohini Kulkarni > > > > > -- > Rohini Kulkarni > > _______________________________________________ > devel mailing list > devel@rtems.org > http://lists.rtems.org/mailman/listinfo/devel _______________________________________________ devel mailing list devel@rtems.org http://lists.rtems.org/mailman/listinfo/devel