On Wed, Mar 29, 2017 at 5:47 AM, Ola Liljedahl <[email protected]> wrote:
> On 29 March 2017 at 10:43, Francois Ozog <[email protected]> wrote: > >> If there is a cost to get virtual address, then I assume translation is >> NOT just casting: correct? >> > Correct. linux-generic has a number of dereferences in the code that > returns e.g. the buffer address from a buffer handle. This is not optimised > for performance. The design does provide the ability to check buffer > handles for correctness/validity but I cannot see any code that actually > does this so an invalid buffer handle might crash the code (some out of > bounds memory access). > > I suspect that the hot spots are due to the fact that in many cases we are only using a 32-bit value and wrapping it in a 64-bit handle. This was originally done to make the strong typing 32/64 bit agnostic. But this can change if we widen the linux-generic handles to use the full pointer width. Ola: you should no longer be seeing those hot spots in the packet code since with the more recent changes Petri introduced the odp_packet_t is now simply a pointer to an odp_packet_hdr_t, similar to how in odp-dpdk it is a pointer to an rte_mbuf. Certainly in odp-cloud we should do a similar mapping for other key handle types. Another approach requires a bit more config/tools technology would be to support multiple type definitions as a performance tuning option. When developing you'd compile using an include structure that has handles defined as pointers to structs to get the strong typing and then support a compile option for production use that redefines the handles to be uint32_t. That would reduce their footprint to 32-bits but would lose strong type checking, however that's a trade-off an application writer could decide is worth while. Currently we support a -DDEBUG option that includes additional runtime checking. We could do this via a similar -DTYPE_CHECK option (default) and support a -DNO_TYPE_CHECK for the "compact" handles. > >> FF >> >> On 29 March 2017 at 10:00, Ola Liljedahl <[email protected]> >> wrote: >> >>> So there is a choice between >>> A) enabling static type checking in the compiler through strong typing >>> => requires (syntactical) pointers i C => handles are 64-bit on 64-bit >>> systems >>> B) optimise for size and cache efficiency by using 32-bit (scalar) >>> handles >>> >>> Currently this choice is hard-wired into the ODP linux-generic >>> implementation. >>> >>> When profiling some ODP examples, I can see hot spots in the functions >>> that convert "pointer"-handles into the actual object pointers >>> (virtual addresses). So we are paying a double price here, handles are >>> large (increases cache pressure) and we have to translate handles to >>> address before we can reference the objects in the ODP calls. >>> >>> On 29 March 2017 at 06:10, Bill Fischofer <[email protected]> >>> wrote: >>> > >>> > On Tue, Mar 28, 2017 at 10:47 PM Honnappa Nagarahalli >>> > <[email protected]> wrote: >>> >> >>> >> On 28 March 2017 at 22:27, Bill Fischofer <[email protected]> >>> >> wrote: >>> >> > >>> >> > >>> >> > On Mon, Mar 27, 2017 at 10:11 PM, Honnappa Nagarahalli >>> >> > <[email protected]> wrote: >>> >> >> >>> >> >> On 27 March 2017 at 08:36, Ola Liljedahl <[email protected] >>> > >>> >> >> wrote: >>> >> >> > On 27 March 2017 at 07:58, Honnappa Nagarahalli >>> >> >> > <[email protected]> wrote: >>> >> >> >> My answers inline. I was confused as hell just a month back :) >>> >> >> >> >>> >> >> >> On 23 March 2017 at 06:28, Francois Ozog < >>> [email protected]> >>> >> >> >> wrote: >>> >> >> >> >>> >> >> >>> The more I dig the less I understand ;-) >>> >> >> >>> >>> >> >> >>> Let me ask a few questions: >>> >> >> >>> >>> >> >> >>> - in the future, when selling 32 bit silicon, which >>> architecture >>> >> >> >>> version >>> >> >> >>> will it be ARMv7 or ARMv8 ? >>> >> >> > AFAIK, future 32-bit ARM cores (from ARM) will be ARMv8. But >>> people >>> >> >> > are still building SoC's with e.g. ARM920 which is ARMv4T or >>> >> >> > something. >>> >> >> > >>> >> >> >>> >>> >> >> >> >>> >> >> >> What you are referring to is ISA version, not architecture. >>> AArch32 >>> >> >> >> and >>> >> >> >> AArch64 are architectures. ARMv8 also supports AArch32 (i.e. >>> AArch32 >>> >> >> >> with >>> >> >> >> ARMv8 ISA) >>> >> >> > ARMv8 has two architectural states, AArch32 and AArch64. An ARMv8 >>> >> >> > implementation can implement either-or or both. There are already >>> >> >> > examples out there of all these different combinations. >>> >> >> > >>> >> >> > AAarch32 supports the A32 and T32 ISA's, these are closely >>> related to >>> >> >> > (basically extensions of) the corresponding ARMv7a ARM and >>> Thumb(-2) >>> >> >> > ISA's. >>> >> >> > The A32 (and T32?) ISA's have some of the ARMv8 extensions, e.g. >>> >> >> > load-acquire, store-release, crypto instructions, simplified WFE >>> >> >> > support etc. >>> >> >> > A user space ARMv7a image should run unmodified on >>> ARMv8/AArch32, I >>> >> >> > don't know about other privilege levels but I can imagine an >>> ARMv7a >>> >> >> > kernel running in AArch32 with an AArch64 hypervisor. >>> >> >> > >>> >> >> > AArch64 supports the A64 ISA. This ISA actually supports both >>> 32-bit >>> >> >> > and 64-bit operations (although all addresses are 64-bit AFAIK). >>> >> >> > 32-bit operations use Wn registers and 64-bit operations use Xn >>> >> >> > registers. It's the same register set, Wn just denotes the lower >>> 32 >>> >> >> > bits. >>> >> >> > >>> >> >> >> >>> >> >> >> - is the target solution will be running ALL in 32 bits? (boot >>> in 32 >>> >> >> >> bits, >>> >> >> >>> Linux 32 bits, 32 bits apps)? >>> >> >> >>> - or is the target solution will be hybrid (64 bits Linux and >>> some >>> >> >> >>> 32 >>> >> >> >>> bits >>> >> >> >>> apps). >>> >> >> > I think this is the more likely path. If you have >= than 4GB of >>> RAM >>> >> >> > (and also other stuff that needs physical addressing), you want a >>> >> >> > 64-bit kernel. >>> >> >> > >>> >> >> >>> >>> >> >> >> >>> >> >> >> The target solution could be Hybrid. Linux could be 64b, the >>> >> >> >> applications >>> >> >> >> could be 32b. It is my understanding that everything 32b is also >>> >> >> >> possible >>> >> >> >> using AArch32. >>> >> >> >> >>> >> >> >> >>> >> >> >>> When I read "AArch64 was designed to remove known >>> implementation >>> >> >> >>> challenges of AArch32 cores" on http://infocenter.arm.com/ >>> >> >> >>> help/index.jsp?topic=/com.arm.doc.dai0490a/ar01s01.html >>> >> >> >>> I wonder if stating we support AArch32 is a good idea... >>> >> >> >>> >>> >> >> >>> So what is the best way to describe what we want? >>> >> >> >>> - ARMv8 LP64 or ILP32 ? >>> >> >> >>> - AArch64 LP64 or ILP32 ? >>> >> >> >>> - LP64 or ILP32? >>> >> >> >>> >>> >> >> >>> I think the best way to say is 'we support AArch64 and >>> AArch32'. >>> >> >> > Re AArch64, LP64 or ILP32 applications? >>> >> >> > >>> >> >> > AArch32 or ARMv7a? >>> >> >> > >>> >> >> >> >>> >> >> >> >>> >> >> >>> FF >>> >> >> >>> >>> >> >> >>> >>> >> >> >>> On 23 March 2017 at 04:57, Honnappa Nagarahalli < >>> >> >> >>> [email protected]> wrote: >>> >> >> >>> >>> >> >> >>>> Hi Bill / Matt and others, >>> >> >> >>>> What I was trying to say in our discussion is >>> that, >>> >> >> >>>> the >>> >> >> >>>> ODP-Cloud code should not be pointer heavy. >>> >> >> >>>> >>> >> >> >>>> Please take a look at this video from BUD17: >>> >> >> >>>> http://connect.linaro.org/resource/bud17/bud17-101/ >>> (unfortunately >>> >> >> >>>> there are no slides, I am trying to get them). This talks >>> about >>> >> >> >>>> the >>> >> >> >>>> performance of the 32b application on AArch64. One of the >>> >> >> >>>> applications, has huge performance improvement while running >>> in >>> >> >> >>>> 32b >>> >> >> >>>> mode (ILP32 in this particular case) on AArch64 (when >>> compared to >>> >> >> >>>> the >>> >> >> >>>> same application compiled for 64b mode running on AArch64 >>> i.e. in >>> >> >> >>>> 64b >>> >> >> >>>> compilation it performed very poorly). My understanding is >>> that >>> >> >> >>>> this >>> >> >> >>>> particular application is a pointer chasing application. Other >>> >> >> >>>> applications which are not pointer heavy, do not have this >>> >> >> >>>> behavior. >>> >> >> > Isn't the problem with LP64 that if you have a lot of pointers >>> stored >>> >> >> > in data structures, these take 2x the space of ILP32 pointers and >>> >> >> > thus >>> >> >> > increases the cache pressure. >>> >> >> > >>> >> >> > I don't think it is the pointer chasing itself that is penalised >>> by >>> >> >> > 64-bit pointers. Pointer chasing apps are penalised by long >>> >> >> > load-to-use latencies (L1 cache hit latency, L2/L3 latencies, >>> DRAM >>> >> >> > latency). >>> >> >> > >>> >> >> >>>> >>> >> >> >>>> So, we need to make sure ODP-Cloud is not pointer heavy and >>> does >>> >> >> >>>> not >>> >> >> >>>> force the application to be pointer heavy, to get good >>> performance >>> >> >> >>>> out >>> >> >> >>>> of 64b systems. >>> >> >> > Even with LP64, ODP could use 32-bit handles for ODP objects. The >>> >> >> > address lookup of the handle needs to be efficient (from a cache >>> >> >> > perspective) though, already now I can see hotspots in the >>> function >>> >> >> > that returns an address from a handle. >>> >> >> > >>> >> >> >>> >> >> Yes, this is what I am trying to convey. If we have 32-bit >>> handles, it >>> >> >> does not matter whether it is Aarch32 or Aarch64, the performance >>> will >>> >> >> be optimized. >>> >> > >>> >> > >>> >> > The only way we've been able to achieve strong typing with ODP is >>> if the >>> >> > handles are of size sizeof(void *). This isn't the case in AArch64, >>> so I >>> >> > don't think this will hold. Obviously when ODP is compiled for >>> AArch32 >>> >> > pointers (and hence handles) are 32-bits. >>> >> > >>> >> I did not understand your comment on strong typing. Can you elaborate >>> >> or provide an example? >>> >> If the handles need to be 64b (i.e. even on a 32b system they are >>> >> 64b), then we should keep them as 64b. Otherwise, performance should >>> >> be given higher priority. >>> > >>> > >>> > Look at the ODP strong type files in the plat directory. We achieve >>> strong >>> > typing by defining handles to be pointers to structs, which C treats as >>> > different types. There doesn't appear to be any other way to achieve >>> this >>> > since C typedefs are weakly typed. >>> >> >>> >> >>> >> >>> >> >> >>> >> >> >>> >> >> >>>> >>> >> >> >>>> Thank you, >>> >> >> >>>> Honnappa >>> >> >> >>>> >>> >> >> >>> >>> >> >> >>> >>> >> >> >>> >>> >> >> >>> -- >>> >> >> >>> [image: Linaro] <http://www.linaro.org/> >>> >> >> >>> François-Frédéric Ozog | *Director Linaro Networking Group* >>> >> >> >>> T: +33.67221.6485 >>> >> >> >>> [email protected] | Skype: ffozog >>> >> >> >>> >>> >> >> >>> >>> >> > >>> >> > >>> >> >> >> >> -- >> [image: Linaro] <http://www.linaro.org/> >> François-Frédéric Ozog | *Director Linaro Networking Group* >> T: +33.67221.6485 >> [email protected] | Skype: ffozog >> >> >
