On Tue, Oct 15, 2013 at 02:03:48PM +0400, Kirill Yukhin wrote: > Let me somewhat summarize current understanding of > host binary linking as well as target binary building/linking. > > We put code which supposed to be offloaded to dedicated sections, > with name starting with gnu.target_lto_ > > At link time (I mean, link time of host app): > 1. Generate dedicated data section in each binary (executable or DSO), > which'll be a placeholder for offloading stuff. > > 2. Generate __OPENMP_TARGET__ (weak, hidden) symbol, > which'll point to start of the section mentioned in previous item. > > This section should contain at least: > 1. Number of targets > 2. Size of offl. symbols table > > [ Repeat `number of targets'] > 2. Name of target > 3. Offset to beginning of image to offload to that target > 4. Size of image > > 5. Offl. symbols table > > Offloading symbols table will contain information about addresses > of offloadable symbols in order to create mapping of host<->target > addresses at runtime. > > To get list of target addresses we need to have dedicated interface call > to libgomp plugin, something like getTargetAddresses () which will > query target for the list of addresses (accompanied with symbol names). > To get this information target DSO should contain similar table of > mapping symbols to address.
No, IMHO it is enough if the linker plugin finds the array of the target addresses in the shared library it is going to embed (e.g. using some magic symbol lookup, or named section) and just put a pointer to that place in the payload into the __OPENMP_TARGET__ header structure, or whatever other way will be best to provide that info to libgomp. Say, if the pairs host_address, size are put into .gnu.target_addr section in the host code and we arrange for the address to be put into vars in .gnu.target_addr section in the .gnu.target_lto* IL for target, in the end there will be a table of the target addresses in .gnu.target_addr section in the target shared library. So, either the __OPENMP_TARGET__ header entry for the corresponding target (MIC in your case) would contain both the host .gnu.target_addr table and a pointer to the .gnu.target_addr in the payload, or the plugin could copy it over and create a table with { host_addr, size, target_addr_nonrelocated } and libgomp would just add a load bias of the target shared library to the target address. > Application is going to have single instance of libgomp, which > in turn means that we'll have single splay tree holding information > about mapping (host -> target) for all DSO and executable. One splay tree per device without shared address space in particular. > We have at least 2 approaches of host->target mapping solving. > > I. Preserve order of symbols appearance. > Table row: [ address, size ] > For routines, size to be 1 > > In order to initialize the table we need to get two arrays: > of host and target addresses. The order of appearance of objects in > these arrays must be the same. Having this makes mapping easy. > We just need to find index if given address in array of host addrs and > then dereference array of target addresses with index found. > > The problem is that it unlikely will work when LTO of host is ON. > I am also not sure, that order of handling objects on target is the same > as on host. I don't see why it wouldn't work, it will be the duty of the linker plugin not to reorder the objects. > II. Store symbol identifier along with address. > Table row: [ symbol_name, address, size] > For routines, size to be 1 > > To construct the table of host addresses, at link > time we put all symbol (marked at compile time with dedicated > attribute) addresses to the table, accompanied with symbol names (they'll > serve as keys) > > During initialization of the table we create host->target address mapping > using symbol names as keys. No, this is not going to work, as I said earlier, names aren't necessarily unique for static functions. > > The last thing I wanted to summarize: compiling target code. > > We have 2 approaches here: > > 1. Perform WPA and extract sections, marked as target, into separate object > file. Then call target compiler on that object file to produce the > binary. > > As mentioned by Jakub, this approach will complicate debugging. > > 2. Pass fat object files directly to the target compiler (one CU at a > time). > So, for every object file we are going to call GCC twice: > - Host GCC, which will compile all host code for every CU > - Target GCC, which will compile all target code for every CU > > I vote for option #2 as far as WPA-based approach complicates debugging. > What do you guys think? One needs to think about ld -r, the linker plugin might actually see multiple CUs in one object file, so perhaps the target compiler will need to be run on the same *.o file several times, with different offsets or whatever other way to identify the CU in the sections (if .gnu.target_lto* has section headers, it will be easier). Jakub