Hi all, I have managed to get a significant performance improvement with some changes in configurations.
The measured time was for dhrystones reduced from 12 to "too small to be measured " For dhrystones the time was 0.4. The number of dhrystones per second increased from approximately 83333 to 2500000 :) Thanks! On Sun, Jun 21, 2015 at 1:32 AM, Rohini Kulkarni <krohini1...@gmail.com> wrote: > Hi, > > I have added an SMP related post to my blog to define where exactly in the > code I need to work. Some feedback to indicate if I am identifying the work > area correctly would be very helpful! > > Thanks! > On 18 Jun 2015 03:37, "Rohini Kulkarni" <krohini1...@gmail.com> wrote: > >> Hi all, >> >> I have updated my blog to reflect my understanding and attempts for cache >> performance issue. >> >> Lately I have been trying around memory attributes for the >> mm_config_table. One set of configurations for cacheable memory (inner and >> outer levels)ended up reducing performance further ( which I really thought >> would improve). So this table set up certainly controls performance. >> >> The results are not improving after turning on cache. So memory sections >> are perhaps not even getting cached. >> I get a feeling it has got to do with this mm_config_table. >> >> Updates from the github code and blog might help in further discussion. >> >> Link to github code:https://github.com/krohini1593/rtems/tree/rohini >> >> Link to Blog <http://rohiniwithrpi2.blogspot.in/p/blog-page_3.html> >> >> Thanks! >> >> On Mon, Jun 15, 2015 at 8:29 PM, Alan Cudmore <alan.cudm...@gmail.com> >> wrote: >> >>> Hi, >>> Some of the code examples may give you some clues. Like this one: >>> https://github.com/mrvn/test/blob/master/smp.cc >>> >>> Or this: >>> https://github.com/PeterLemon/RaspberryPi/tree/master/SMP/SMPINIT >>> >>> If you still can't figure it out, you can always join the >>> raspberrypi.org forums and ask on this thread: >>> https://www.raspberrypi.org/forums/viewtopic.php?f=72&t=98904 >>> >>> When it comes to the Pi 2 and SMP, you are our RTEMS expert :) >>> >>> Thanks, >>> Alan >>> >>> >>> On Sat, Jun 13, 2015 at 2:29 PM, Rohini Kulkarni <krohini1...@gmail.com> >>> wrote: >>> >>>> Hi, >>>> >>>> This is regarding Pi 2 SMP support. After powering on, the secondary >>>> mailboxes read one of their four mailbox registers and wait for a non-zero >>>> content to be written. This content is to be the physical address of the >>>> location from where the cores are expected to start execution. >>>> >>>> I am stuck at figuring out this address. How should I go about >>>> understanding this? >>>> >>>> Thanks! >>>> On 3 Jun 2015 19:44, "Gedare Bloom" <ged...@gwu.edu> wrote: >>>> >>>>> On Wed, Jun 3, 2015 at 2:39 AM, Rohini Kulkarni <krohini1...@gmail.com> >>>>> wrote: >>>>> > But, I can't say cache configurations have a role here. >>>>> > >>>>> > I'll push my code to my github project soon. >>>>> > >>>>> > P.S. The Pi2 board I possess seems to have broken down. It just isn't >>>>> > turning on. Unable to test further. Will order one immediately. >>>>> > >>>>> Ouch. Make sure you put it in a safe space for development, clear of >>>>> threats like moisture, static shock, and cats. >>>>> >>>>> > On 3 Jun 2015 09:03, "Rohini Kulkarni" <krohini1...@gmail.com> >>>>> wrote: >>>>> >> >>>>> >> Hi, >>>>> >> >>>>> >> Alan, your suggestion has resulted in much improvement >>>>> >> >>>>> >> arm_control=0x1000 >>>>> >> >>>>> >> This has simply worked! Looks like the other cores were taking up >>>>> plenty >>>>> >> of time. >>>>> >> I was aware from references that the other cores run a WFI, but ya, >>>>> did >>>>> >> not get its impact. >>>>> >> Time for each dhrystone has reduced to 7 from 13 and the no of >>>>> dhrystones >>>>> >> per second also increased. >>>>> >> >>>>> >> But this is a change only in the config.txt not actually in the >>>>> boot code. >>>>> >> >>>>> >> Thanks >>>>> >> >>>>> >> Rohini >>>>> >> >>>>> >> >>>>> >> >>>>> >> On Wed, Jun 3, 2015 at 7:12 AM, Alan Cudmore < >>>>> alan.cudm...@gmail.com> >>>>> >> wrote: >>>>> >>> >>>>> >>> The caches are being enabled on the RPI 1 BSP. The same code is >>>>> being >>>>> >>> executed by the RPI 2 BSP, but obviously it’s not sufficient for >>>>> the cache >>>>> >>> setup. >>>>> >>> I have been reading through this long thread, and it is very >>>>> informative: >>>>> >>> https://www.raspberrypi.org/forums/viewtopic.php?f=72&t=98904 >>>>> >>> >>>>> >>> I am starting to understand the setup that is required to enable >>>>> caches >>>>> >>> on the RPI 2. For example this message near the bottom of page 3 >>>>> gives a >>>>> >>> good indication of the speedup available by configuring the MMU >>>>> and caches >>>>> >>> correctly: >>>>> >>> Quote from above thread >>>>> >>> ------------------------------ >>>>> >>> Enabling I/D caches and branch prediction, just like the julia >>>>> demo uses, >>>>> >>> it takes ~12 seconds, or ~21 fps. It's just one core but also a >>>>> much smaller >>>>> >>> loop than the julia demo has. >>>>> >>> >>>>> >>> Enabling the MMU and mapping memory inner/outer write-back, write >>>>> >>> allocate and the framebuffer inner write-through, no write >>>>> allocate + outer >>>>> >>> write-back, write-allocate it takes ~8 seconds, of 32 fps. >>>>> >>> >>>>> >>> PS: 640x480x32 with MMU gets me ~256 fps. Must have a greater L2 >>>>> cache >>>>> >>> effect. >>>>> >>> ------------------------- >>>>> >>> End of quote >>>>> >>> >>>>> >>> The person who posted the above comment (mrvn) posted the code >>>>> here: >>>>> >>> https://github.com/mrvn/test/blob/master/mmu.cc >>>>> >>> >>>>> >>> >>>>> >>> Also, it seems that when the Pi 2 starts up, cores 1-3 are put in >>>>> a wait >>>>> >>> loop always accessing the bus. By putting this option in the >>>>> config.txt file >>>>> >>> you can put the other cores to sleep, speeding up the code on core >>>>> 1. >>>>> >>> arm_control=0x1000 >>>>> >>> It would be worth trying that option to see if the benchmark >>>>> speeds up. >>>>> >>> >>>>> >>> >>>>> >>> Alan >>>>> >>> >>>>> >>> On Jun 2, 2015, at 8:05 AM, Hesham ALMatary < >>>>> heshamelmat...@gmail.com> >>>>> >>> wrote: >>>>> >>> >>>>> >>> On Tue, Jun 2, 2015 at 12:41 PM, Rohini Kulkarni < >>>>> krohini1...@gmail.com> >>>>> >>> wrote: >>>>> >>> >>>>> >>> From what I saw, they have to be enabled separately. Cache/mmu are >>>>> >>> disabled >>>>> >>> upon reset. >>>>> >>> >>>>> >>> For the existing Raspberry BSP [1] there's a code for MMU/Cache >>>>> init, >>>>> >>> however I don't know about Pi2 and where its code is. >>>>> >>> >>>>> >>> [1] >>>>> >>> >>>>> https://github.com/RTEMS/rtems/tree/master/c/src/lib/libbsp/arm/raspberrypi >>>>> >>> >>>>> >>> On 2 Jun 2015 16:59, "Hesham ALMatary" <heshamelmat...@gmail.com> >>>>> wrote: >>>>> >>> >>>>> >>> >>>>> >>> Hi, >>>>> >>> >>>>> >>> Aren't the MMU/Caches enabled by default for RPi [1]? >>>>> >>> >>>>> >>> [1] >>>>> >>> >>>>> >>> >>>>> https://github.com/RTEMS/rtems/blob/master/c/src/lib/libbsp/arm/shared/mminit.c >>>>> >>> >>>>> >>> On Tue, Jun 2, 2015 at 12:18 PM, Joel Sherrill >>>>> >>> <joel.sherr...@oarcorp.com> wrote: >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> On June 2, 2015 7:01:21 AM EDT, Rohini Kulkarni < >>>>> krohini1...@gmail.com> >>>>> >>> wrote: >>>>> >>> >>>>> >>> Dr. Joel, >>>>> >>> >>>>> >>> So we can't say something solely on the basis of this result? >>>>> >>> >>>>> >>> >>>>> >>> I don't think so. If Linux performs the same, then what you did is >>>>> as >>>>> >>> good as it gets. >>>>> >>> >>>>> >>> However, if Linux is faster then some setting still isn't right. >>>>> >>> >>>>> >>> You need a reference measurement to have any confidence. It is >>>>> possible >>>>> >>> you did something but didn't actually turn the cache (or all the >>>>> cache) >>>>> >>> on. >>>>> >>> >>>>> >>> On 2 Jun 2015 16:28, "Rohini Kulkarni" <krohini1...@gmail.com> >>>>> wrote: >>>>> >>> >>>>> >>> I have not run it under linux on pi2 yet. Will have to run and >>>>> check >>>>> >>> the result. >>>>> >>> >>>>> >>> On 2 Jun 2015 16:16, "Joel Sherrill" <joel.sherr...@oarcorp.com> >>>>> wrote: >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> On June 2, 2015 5:58:33 AM EDT, Rohini Kulkarni < >>>>> krohini1...@gmail.com> >>>>> >>> wrote: >>>>> >>> >>>>> >>> HI, >>>>> >>> >>>>> >>> I tried running the dhrystone benchmark with some changes for >>>>> >>> >>>>> >>> cache/mmu >>>>> >>> >>>>> >>> set up. >>>>> >>> >>>>> >>> However, the output shows a reduction in performance. >>>>> >>> The time to run through the dhrystone has increased from 12 to 13 >>>>> and >>>>> >>> dhrystones run per second decreased. >>>>> >>> >>>>> >>> According to this result, things were better with caches disabled. >>>>> >>> >>>>> >>> >>>>> >>> I have been working on this since two days and could not figure >>>>> out an >>>>> >>> improvement. Any pointers? >>>>> >>> >>>>> >>> >>>>> >>> How did it do under Linux on the Pi2? >>>>> >>> >>>>> >>> >>>>> >>> Thanks. >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> On Thu, May 28, 2015 at 8:41 PM, Rohini Kulkarni >>>>> >>> <krohini1...@gmail.com> wrote: >>>>> >>> >>>>> >>> Hi All, >>>>> >>> >>>>> >>> I have to implement the cache coherency support for Cortex A7. But >>>>> for >>>>> >>> A7 MPCore, unlike for A9, I am not able to find any register >>>>> >>> description for the Snoop Control Unit from the TRM. >>>>> >>> >>>>> >>> I need help here on how to proceed. >>>>> >>> >>>>> >>> Additionally for A9 there is a single bit for A9 in the Auxiliary >>>>> >>> Control Register which enables cache broadcast operations. The >>>>> >>> >>>>> >>> register >>>>> >>> >>>>> >>> format is different for A7 and again I am unable to find how to >>>>> >>> >>>>> >>> achieve >>>>> >>> >>>>> >>> the same for A7. >>>>> >>> >>>>> >>> Thanks! >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> On Tue, May 5, 2015 at 10:42 PM, Joel Sherrill >>>>> >>> <joel.sherr...@oarcorp.com> wrote: >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> On 5/5/2015 11:11 AM, Rohini Kulkarni wrote: >>>>> >>> >>>>> >>> Hi, >>>>> >>> >>>>> >>> I am working with the code for bsp hooks. I am referring to >>>>> existing >>>>> >>> ARM multicore bsp codes, zync mainly. >>>>> >>> >>>>> >>> 1. There are existing hooks for the raspberry pi. Where should the >>>>> >>> >>>>> >>> code >>>>> >>> >>>>> >>> for the Pi2 hooks be added? >>>>> >>> >>>>> >>> The Pi and Pi2 are remarkably similar so Pi2 should be placed >>>>> inside >>>>> >>> the Pi BSP directory. >>>>> >>> There is already a Pi2 variant of that code built. But we know >>>>> >>> >>>>> >>> specific >>>>> >>> >>>>> >>> places where there >>>>> >>> are variances. Depending on the scope of what is different, it can >>>>> be >>>>> >>> as simple as >>>>> >>> a cpp conditional in a .h to select a value or two implementations >>>>> of >>>>> >>> >>>>> >>> a >>>>> >>> >>>>> >>> single method >>>>> >>> and the Makefile.am picking the right file to build based on the >>>>> board >>>>> >>> variant. >>>>> >>> >>>>> >>> The big question to always ask is: Is this specific to the Pi2 and >>>>> >>> incompatible with the Pi? >>>>> >>> >>>>> >>> Since the Pi BSP is still missing capabilities, it is likely code >>>>> >>> common to both will >>>>> >>> be added this summer. For example, did the mailbox interface >>>>> change? I >>>>> >>> don't know >>>>> >>> but would guess that it didn't. Each new capability added needs >>>>> that >>>>> >>> added. >>>>> >>> >>>>> >>> And any differences need to be analyzed to pick the least intrusive >>>>> >>> >>>>> >>> way >>>>> >>> >>>>> >>> to provide >>>>> >>> alternate implementations. Or enable special code like the Pi2 SMP >>>>> >>> support which >>>>> >>> is dependent on --enable-smp and being a Pi2. >>>>> >>> >>>>> >>> 2. Am I right in understanding that I will have to implement A7 >>>>> >>> specific functions as have been for A9? I am referring >>>>> specifically to >>>>> >>> the arm-a9mpcore-start.h >>>>> >>> >>>>> >>> Yes. >>>>> >>> >>>>> >>> If the code is very similar between the a7 and a9, then a >>>>> discussion >>>>> >>> on devel@ should occur to decide the best way to minimize >>>>> duplication. >>>>> >>> >>>>> >>> If you end up with a7 specific code, you should follow the location >>>>> >>> >>>>> >>> and >>>>> >>> >>>>> >>> >>>>> >>> naming patterns already established. That places it in >>>>> >>> libbsp/arm/shared/... >>>>> >>> so it can be used by any BSP with the right SMP core. >>>>> >>> >>>>> >>> >>>>> >>> I am referring to existing codes to locate and get hold of what >>>>> needs >>>>> >>> to be done in the hooks. However, being new to such >>>>> implementations, I >>>>> >>> am taking longer to understand the details. Any suggestions that >>>>> might >>>>> >>> help here are welcome >>>>> >>> >>>>> >>> The answer will depend on the factors listed above. When code can >>>>> >>> be shared, we want to share it across as many BSPs as makes sense. >>>>> >>> When it is unique to a specific BSP **variant** (e.g. Pi vs Pi2), >>>>> then >>>>> >>> you want to find the way to account for the variation in the least >>>>> >>> intrusive code way possible. >>>>> >>> >>>>> >>> Thanks! >>>>> >>> >>>>> >>> On 1 May 2015 12:45, "Rohini Kulkarni" <krohini1...@gmail.com> >>>>> wrote: >>>>> >>> >>>>> >>> >>>>> >>> Hi, >>>>> >>> >>>>> >>> Excited to be a part of this edition of GSoC! Thanks to everybody >>>>> for >>>>> >>> helping me get here and congratulations to all the participating >>>>> >>> students! >>>>> >>> >>>>> >>> So, now getting to work, firstly I wish to know, specifically from >>>>> my >>>>> >>> mentors, any changes that must be made to my proposed project or >>>>> >>> schedule. >>>>> >>> >>>>> >>> Secondly, are there any specifics for the development blog that we >>>>> >>> >>>>> >>> need >>>>> >>> >>>>> >>> to create for the project? Over time what is the blog expected to >>>>> >>> convey. >>>>> >>> >>>>> >>> Also, I have to create a new wiki page for my project as none >>>>> exists. >>>>> >>> >>>>> >>> I >>>>> >>> >>>>> >>> want to know how to add one. >>>>> >>> >>>>> >>> -- >>>>> >>> >>>>> >>> Rohini Kulkarni >>>>> >>> >>>>> >>> >>>>> >>> -- Joel Sherrill, Ph.D. Director of Research & Development >>>>> >>> joel.sherr...@oarcorp.com On-Line Applications Research Ask me >>>>> about >>>>> >>> RTEMS: a free RTOS Huntsville AL 35805 Support Available (256) >>>>> >>> >>>>> >>> 722-9985 >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> -- >>>>> >>> >>>>> >>> Rohini Kulkarni >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> -- >>>>> >>> >>>>> >>> Rohini Kulkarni >>>>> >>> >>>>> >>> >>>>> >>> --joel >>>>> >>> >>>>> >>> >>>>> >>> --joel >>>>> >>> _______________________________________________ >>>>> >>> devel mailing list >>>>> >>> devel@rtems.org >>>>> >>> http://lists.rtems.org/mailman/listinfo/devel >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> -- >>>>> >>> Hesham >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> -- >>>>> >>> Hesham >>>>> >>> _______________________________________________ >>>>> >>> devel mailing list >>>>> >>> devel@rtems.org >>>>> >>> http://lists.rtems.org/mailman/listinfo/devel >>>>> >>> >>>>> >>> >>>>> >> >>>>> >> >>>>> >> >>>>> >> -- >>>>> >> Rohini Kulkarni >>>>> > >>>>> > >>>>> > _______________________________________________ >>>>> > devel mailing list >>>>> > devel@rtems.org >>>>> > http://lists.rtems.org/mailman/listinfo/devel >>>>> >>>> >>>> _______________________________________________ >>>> devel mailing list >>>> devel@rtems.org >>>> http://lists.rtems.org/mailman/listinfo/devel >>>> >>> >>> >> >> >> -- >> Rohini Kulkarni >> > -- Rohini Kulkarni
_______________________________________________ devel mailing list devel@rtems.org http://lists.rtems.org/mailman/listinfo/devel