jfb added a comment. Can you add a link to bug 40605 in the commit message?
> I'm not quite sure how to show the resulting difference in code. Do you mean > we need Clang to support both modes and to compare the resulting assembly? I only meant tests that show codegen, as you've now added auto-var-init.cpp. There might be interesting cases which I wasn't testing when I wrote it, so if you think of any please do make sure you add some. You mentioned a Linux struct that used to crash? That would be a useful test (I imagine it's one with the `Loc` adjustment). ================ Comment at: tools/clang/lib/CodeGen/CGDecl.cpp:1143 + const llvm::StructLayout *Layout = + CGM.getDataLayout().getStructLayout(cast<llvm::StructType>(Ty)); + for (unsigned i = 0; i != constant->getNumOperands(); i++) { ---------------- glider wrote: > jfb wrote: > > I think you need a heuristic here, where you don't emit multiple stores if > > `getNumOperands` is greater than some number. In that case you'd keep doing > > memcpy instead. Note that a struct containing sub-structs (or arrays) > > should also count the number of sub-structs (so just checking > > `getNumOperands` isn't sufficient). > > > > Other parts of this code chose 6 stores... > Do we ever need to not split these stores? Can't the MemCpyOpt pass take > care of them later on? You're assuming the optimizer is running at all :-) In general your approach seems to care about `-O2`, and I agree that's usually what we want to tune. However, clang also support `-O0` which won't touch these stores at all, and in `-O0` it's usually nice to just see a copy from a global while debugging. Further, configurations such as `-Os` and `-Oz` would be hampered by excessive codegen. You also need to consider the compile-time hit the expansion you're making would add. More code means more time to compile, and I'd expect little to no win from emitting more than a handful of stores. If someone has a large thing on the stack, and it can be optimized, then we should teach the optimizer to deal with memcpy better. I don't think e.g. a 400+ byte struct makes sense to initialize with a bunch of store. So I think you want to check what the size of the struct is (check out when x86-64 and ARM64 inline memcpy / memset, I assume those are decent guesses for size to inline). Further, your current code sill do something silly for code like this: ``` struct S { int j; char arr[8]; float f; }; ``` Before you'd have a single `memcpy`, but now it'll be a store, a `memset`, and a store. I think you really want to have the same heuristic for arrays (so you'd have just stores), and you want to make sure that if you're going to split up a struct you do so for all the fields inside that struct. ================ Comment at: tools/clang/lib/CodeGen/CGDecl.cpp:1765 + emitStoresForConstant(CGM, D, Loc, isVolatile, Builder, constant, + /*forInit*/ false); } ---------------- glider wrote: > jfb wrote: > > Can you explain why this case needs to be different? IIUC it's because the > > types can differ which makes the struct's elements not match up the ones > > from the constant in the loop above? I think the code above should > > automatically handle that case instead of passing in a flag here. > > > > You also definitely need to test the case where that happens :) > I just thought we shouldn't break the constants that we didn't create with > -ftrivial-var-auto-init. > I'll take a closer look though (indeed, this crashes on one of the structs > from the Linux kernel) Yeah I think it's from the `Loc` adjustment above. I think your change is worth doing for all cases, not just trivial var auto-init. It'll lead us to optimize all of it much better IMO (and support things like `-O0` `-Os` and `-Oz` uniformly well too). ================ Comment at: tools/clang/test/CodeGenCXX/auto-var-init.cpp:37 // PATTERN: @__const.test_small_custom.custom = private unnamed_addr constant %struct.small { i8 42 }, align 1 // ZERO: @__const.test_small_custom.custom = private unnamed_addr constant %struct.small { i8 42 }, align 1 struct small { char c; }; ---------------- How come `@__const.test_small_custom.custom` still gets emitted? I guess custom initialization goes through a different code path? ================ Comment at: tools/clang/test/CodeGenCXX/auto-var-init.cpp:496 // ZERO-LABEL: @test_smallpartinit_uninit() // ZERO: call void @llvm.memset{{.*}}, i8 0, ---------------- I wonder if (maybe in a separate patch) you also want to avoid `memset` when something is pretty small. ================ Comment at: tools/clang/test/CodeGenCXX/auto-var-init.cpp:664 +// PATTERN: store i32 -1431655766, {{.*}} align 4 +// PATTERN: call void @llvm.memset.{{.*}}(i8* align 4 {{.*}}, i8 0, i64 0, i1 false) // ZERO-LABEL: @test_arraytail_uninit() ---------------- This one is particularly silly since it's a zero-sized `memset` :) CHANGES SINCE LAST ACTION https://reviews.llvm.org/D57898/new/ https://reviews.llvm.org/D57898 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits