Justin, Thank you for your comments. Please take a look at the patch that addresses them:
http://reviews.llvm.org/D11273 On Mon, Jul 13, 2015 at 11:39 PM, Justin Bogner <[email protected]> wrote: > Artem Belevich <[email protected]> writes: > > Author: tra > > Date: Mon Jul 13 18:27:56 2015 > > New Revision: 242085 > > > > URL: http://llvm.org/viewvc/llvm-project?rev=242085&view=rev > > Log: > > [cuda] Driver changes to compile and stitch together host and > device-side CUDA code. > > > > NOTE: reverts r242077 to reinstate r242058, r242065, 242067 > > and includes fix for OS X test failures. > > > > - Changed driver pipeline to compile host and device side of CUDA > > files and incorporate results of device-side compilation into host > > object file. > > > > - Added a test for cuda pipeline creation in clang driver. > > > > New clang options: > > --cuda-host-only - Do host-side compilation only. > > --cuda-device-only - Do device-side compilation only. > > > > --cuda-gpu-arch=<ARCH> - specify GPU architecture for device-side > > compilation. E.g. sm_35, sm_30. Default is sm_20. May be used more > > than once in which case one device-compilation will be done per > > unique specified GPU architecture. > > > > Differential Revision: http://reviews.llvm.org/D9509 > > + > > +CudaHostAction::CudaHostAction(std::unique_ptr<Action> Input, > > + const ActionList &_DeviceActions) > > Identifiers starting with an underscore than a capital are reserved. > Please just call this DeviceActions. > > Done. > > + : Action(CudaHostClass, std::move(Input)), > DeviceActions(_DeviceActions) {} > > + > > +CudaHostAction::~CudaHostAction() { > > + for (iterator it = DeviceActions.begin(), ie = DeviceActions.end(); > it != ie; > > We usually name variables starting with a capital letter, so these > should probably be `I` and `E` rather than `it` and `ie`. Might as well > use auto here as well, since they're just iterators. > This applies in several places later in this patch too. Please clean > them up. > Fixed here and in other places. > > + // Check whether any of device actions stopped before they could > generate PTX. > > + bool PartialCompilation = false; > > + bool DeviceOnlyCompilation = > Args.hasArg(options::OPT_cuda_device_only); > > The ordering here is confusing - it looks like DeviceOnlyCompilation is > related to the loop that sets PartialCompilation. > > Moved DeviceOnlyCompilation down to where it's used. > > + for (unsigned i = 0, e = GpuArchList.size(); i != e; ++i) > > + Actions.push_back( > > + new > CudaDeviceAction(std::unique_ptr<Action>(CudaDeviceActions[i]), > > + GpuArchList[i], /* AtTopLevel */ true)); > > + // Kill host action in case of device-only compilation. > > + if (DeviceOnlyCompilation) > > + Current.reset(nullptr); > > + return Current; > > + } else { > > Since the `if` returns unconditionally, an early exit is better than an > else. > > Done. > > + // Outputs of device actions during complete CUDA compilation get > created > > + // with AtTopLevel=false and become inputs for the host action. > > + ActionList DeviceActions; > > + for (unsigned i = 0, e = GpuArchList.size(); i != e; ++i) > > + DeviceActions.push_back( > > + new > CudaDeviceAction(std::unique_ptr<Action>(CudaDeviceActions[i]), > > + GpuArchList[i], /* AtTopLevel */ false)); > > + // Return a new host action that incorporates original host action > and all > > + // device actions. > > + return std::unique_ptr<Action>( > > + new CudaHostAction(std::move(Current), DeviceActions)); > > + } > > +} > > + > > void Driver::BuildActions(const ToolChain &TC, DerivedArgList &Args, > > const InputList &Inputs, ActionList &Actions) > const { > > llvm::PrettyStackTraceString CrashInfo("Building compilation > actions"); > > @@ -1312,6 +1412,25 @@ void Driver::BuildActions(const ToolChai > > continue; > > } > > > > + phases::ID CudaInjectionPhase; > > + if (isSaveTempsEnabled()) { > > + // All phases are done independently, inject GPU blobs during > compilation > > + // phase as that's where we generate glue code to init them. > > + CudaInjectionPhase = phases::Compile; > > + } else { > > + // Assumes that clang does everything up until linking phase, so > we inject > > + // cuda device actions at the last step before linking. Otherwise > CUDA > > + // host action forces preprocessor into a separate invocation. > > + if (FinalPhase == phases::Link) { > > This is `else if`. Nesting the if inside the else is confusing. > > > + for (auto i = PL.begin(), e = PL.end(); i != e; ++i) { > > + auto next = i + 1; > > + if (next != e && *next == phases::Link) > > What if i == e? > > Then CudaInjectionPhase would be left uninitialized which is a bug. Fixed, which also made moot 'else if' point you made above. > > + CudaInjectionPhase = *i; > > + } > > + } else > > + CudaInjectionPhase = FinalPhase; > > + } > > + > > // Build the pipeline for this file. > > std::unique_ptr<Action> Current(new InputAction(*InputArg, > InputType)); > > for (SmallVectorImpl<phases::ID>::iterator i = PL.begin(), e = > PL.end(); > > @@ -1337,6 +1456,15 @@ void Driver::BuildActions(const ToolChai > > > > // Otherwise construct the appropriate action. > > Current = ConstructPhaseAction(TC, Args, Phase, > std::move(Current)); > > + > > + if (InputType == types::TY_CUDA && Phase == CudaInjectionPhase && > > + !Args.hasArg(options::OPT_cuda_host_only)) { > > + Current = buildCudaActions(*this, TC, Args, InputArg, InputType, > > + std::move(Current), Actions); > > + if (!Current) > > + break; > > + } > > This whole block of code seems out of place. In fact, why doesn't CUDA > just have its own phases::ID? It seems like that would simplify all of > this. > > It does not quite fit the 'phase' concept. Phases implicitly assume that things are done linearly. CUDA, on the other hand, effectively builds a tree and the weird code above is a somewhat broken way to glue results of device-side builds into just the right action of the host build pipeline, so that results can be used to modify compiler arguments. I've got a better way of dealing with that that I'll send as a separate patch for review soon. > + > > if (Current->getType() == types::TY_Nothing) > > break; > > } > > @@ -1576,7 +1704,13 @@ static const Tool *SelectToolForJob(Comp > > if (isa<BackendJobAction>(JA)) { > > // Check if the compiler supports emitting LLVM IR. > > assert(Inputs->size() == 1); > > - JobAction *CompileJA = cast<CompileJobAction>(*Inputs->begin()); > > + JobAction *CompileJA; > > + // Extract real host action, if it's a CudaHostAction. > > + if (CudaHostAction *CudaHA = > dyn_cast<CudaHostAction>(*Inputs->begin())) > > + CompileJA = cast<CompileJobAction>(*CudaHA->begin()); > > + else > > + CompileJA = cast<CompileJobAction>(*Inputs->begin()); > > This seems kind of awkward. Not sure I have a better suggestion though. > > This will also be changes in the upcoming patch. > > if (const InputAction *IA = dyn_cast<InputAction>(A)) { > > // FIXME: It would be nice to not claim this here; maybe the old > scheme of > > // just using Args was better? > > @@ -1635,11 +1783,24 @@ void Driver::BuildJobsForAction(Compilat > > else > > TC = &C.getDefaultToolChain(); > > > > - BuildJobsForAction(C, *BAA->begin(), TC, BAA->getArchName(), > AtTopLevel, > > + BuildJobsForAction(C, *BAA->begin(), TC, ArchName, AtTopLevel, > > I guess this is just an unrelated cleanup? Better to commit those > separately in the future. > > Noted. > > MultipleArchs, LinkingOutput, Result); > > return; > > } > > > > + if (const CudaDeviceAction *CDA = dyn_cast<CudaDeviceAction>(A)) { > > + // Figure out which NVPTX triple to use for device-side compilation > based on > > + // whether host is 64-bit. > > + llvm::Triple > DeviceTriple(C.getDefaultToolChain().getTriple().isArch64Bit() > > + ? "nvptx64-nvidia-cuda" > > + : "nvptx-nvidia-cuda"); > > It seems like a pretty bad layering violation to need to know these > particular triples right here... > > Well, it's up to driver to figure out device-side triple. While I agree that it's way too much target knowledge for the driver, I'm short on practical ideas. Do you have any suggestions? > Is getDefaultToolChain() even the right thing to do here? Isn't `TC` > more appropriate? > Good point. TC indeed should do. -- --Artem Belevich
_______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
