It actually came back to my mind that GCC has a similar feature ... which has the same name LTO for 'Link Time Optimization': http://gcc.gnu.org/wiki/LinkTimeOptimization
I have just tried it using a quite recent toolchain (arm-none-eabi-gcc (GNU Tools for ARM Embedded Processors) 4.8.3 20131129 (release)). And it saved me 8 bytes! Without -lto: FVMAIN_SEC [9%Full] 524288 total, 47648 used, 476640 free FVMAIN_COMPACT [28%Full] 2621440 total, 748992 used, 1872448 free FVMAIN [99%Full] 1941376 total, 1941344 used, 32 free With -lto: FVMAIN_SEC [9%Full] 524288 total, 47648 used, 476640 free FVMAIN_COMPACT [28%Full] 2621440 total, 748984 used, 1872456 free FVMAIN [99%Full] 1941376 total, 1941344 used, 32 free I am now wondering if the 8 bytes comes from the change of date from Thursday to Friday... ________________________________________ From: Andrew Fish [[email protected]] Sent: 23 January 2014 19:15 To: [email protected] Subject: Re: [edk2] Need for FixedFeaturePcdGet() ? On Jan 23, 2014, at 11:02 AM, Andrew Fish <[email protected]<mailto:[email protected]>> wrote: On Jan 23, 2014, at 10:27 AM, Olivier Martin <[email protected]<mailto:[email protected]>> wrote: I have only tried with GCC. In the use case I used in my previous email, _gPcd_FixedAtBuild_PcdRelocateVectorTable is initialized by another C file. I am not a compiler expert, but I guess your compiler would need at least two passes to optimise this specific code once he knows the value of ‘const BOOLEAN _gPcd_FixedAtBuild_PcdRelocateVectorTable’ from another compilation unit. Olivier, The 2nd pass is link time code generation. Visual Studio does it via linker flags, and you can turn it on in clang with -flto. For clang this tells the compiler to emit LLVM bitcode objects (kind of a machine independent assembly language that gets assembled). The linker combines all the bitcode together in the link stage and only at that point is native code generated. Thus whole program optimization is possible so the dead striping works. This is what bitcode looks like in case you are interested. It would work the same way for IA32, X64, Aarch64, etc, since the code gen happens in the linker. Basically the linker links the bitcode object together and dead strips unneeded functions/globals, and then it passes the linked bitcode object into a shared library produced by clang for the code generation to happen. ~/work/Compiler>cat main.c #include <stdio.h> int foo () { return 0; } int main (int argc, char **argv) { int Test[100]; for (;;) { printf ("[0x%02x]", getchar()); } return argc; } ~/work/Compiler>clang -flto -S main.c ~/work/Compiler>cat main.S ; ModuleID = 'main.c' target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128" target triple = "x86_64-apple-macosx10.9.0" @.str = private unnamed_addr constant [9 x i8] c"[0x%02x]\00", align 1 define i32 @foo() nounwind ssp uwtable { ret i32 0 } define i32 @main(i32 %argc, i8** %argv) nounwind ssp uwtable { %1 = alloca i32, align 4 %2 = alloca i32, align 4 %3 = alloca i8**, align 8 %Test = alloca [100 x i32], align 16 store i32 0, i32* %1 store i32 %argc, i32* %2, align 4 store i8** %argv, i8*** %3, align 8 br label %4 ; <label>:4 ; preds = %4, %0 %5 = call i32 @getchar() %6 = call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([9 x i8]* @.str, i32 0, i32 0), i32 %5) br label %4 ; No predecessors! %8 = load i32* %1 ret i32 %8 } declare i32 @printf(i8*, ...) declare i32 @getchar() ~/work/Compiler> Thanks, Andrew Fish As far as I know GCC does not support this. Thanks, Andrew Fish We have recently done some investigation to add two build passes in BaseTools to take advantage of the ARM linker feedback (seehttp://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0474c/CHDFJGBE.html). But we realized that would not be easy to maintain in build_rules.txt. From: Tim Lewis [mailto:[email protected]] Sent: 23 January 2014 18:09 To: [email protected]<mailto:[email protected]> Subject: Re: [edk2] Need for FixedFeaturePcdGet() ? Olivier – I think this came about because VS always removes the dead code in the release build. However, I agree about the “doesn’t appear in the .DSC” problem. It has forced us on a number of occasions to add dummy .DSC entries that have the exact same default value as the .DEC. Tim From: Olivier Martin [mailto:[email protected]] Sent: Thursday, January 23, 2014 10:00 AM To: [email protected]<mailto:[email protected]> Subject: [edk2] Need for FixedFeaturePcdGet() ? Feature PCDs are mainly (only?) used to disable features (ie: removing code) at build type. We found that it actually never removes code. Here is an example (ArmPkg/Drivers/CpuDxe/ArmV6/Exception.c): -------------- ArmDisableFiq (); if (FeaturePcdGet(PcdRelocateVectorTable) == TRUE) { (...) } else { // The Vector table must be 32-byte aligned ASSERT(((UINT32)ExceptionHandlersStart & ((1 << 5)-1)) == 0); // We do not copy the Exception Table at PcdGet32(PcdCpuVectorBaseAddress). We just set Vector Base Address to point into CpuDxe code. ArmWriteVBar ((UINT32)ExceptionHandlersStart); } -------------- If we build it with the upstream UEFI. The C code becomes after pre-preprocessing: -------------- ArmDisableFiq (); if (_gPcd_FixedAtBuild_PcdRelocateVectorTable == ((BOOLEAN)(1==1))) { (...) -------------- And the dissassembly: -------------- bl ArmDisableFiq .LVL19: .loc 1 147 0 ldr r3, .L44+4 ; Get the address of _gPcd_FixedAtBuild_PcdRelocateVectorTable ldrb r3, [r3] ; Load its value cmp r3, #1 ; if (_gPcd_FixedAtBuild_PcdRelocateVectorTable == ((BOOLEAN)(1==1))) { bne .L19 .loc 1 151 0 ldr r3, .L44+8 -------------- Now, let's have a look at the AutoGen.h for this PCD: -------------- #define _PCD_TOKEN_PcdRelocateVectorTable 104U #define _PCD_VALUE_PcdRelocateVectorTable ((BOOLEAN)0U) extern const BOOLEAN _gPcd_FixedAtBuild_PcdRelocateVectorTable; #define _PCD_GET_MODE_BOOL_PcdRelocateVectorTable _gPcd_FixedAtBuild_PcdRelocateVectorTable //#define _PCD_SET_MODE_BOOL_PcdRelocateVectorTable ASSERT(FALSE) // It is not allowed to set value for a FIXED_AT_BUILD PCD -------------- And the definition of FeaturePcdGet() in PcdLib.h: -------------- #define FeaturePcdGet(TokenName) _PCD_GET_MODE_BOOL_##TokenName -------------- Workaround: I hacked PcdLib to used its value instead of accessing the global variable: -------------- #define FeaturePcdGet(TokenName) _PCD_VALUE_##TokenName -------------- But this change was not enough because PCDs that were not set in DSC did not have their _PCD_VALUE_ value. Example of PcdVerifyNodeInList: -------------- #define _PCD_TOKEN_PcdVerifyNodeInList 5U extern const BOOLEAN _gPcd_FixedAtBuild_PcdVerifyNodeInList; #define _PCD_GET_MODE_BOOL_PcdVerifyNodeInList _gPcd_FixedAtBuild_PcdVerifyNodeInList //#define _PCD_SET_MODE_BOOL_PcdVerifyNodeInList ASSERT(FALSE) // It is not allowed to set value for a FIXED_AT_BUILD PCD -------------- So, I hacked BaseTools to always add the value of PCDs: -------------- --- a/BaseTools/Source/Python/AutoGen/GenC.py +++ b/BaseTools/Source/Python/AutoGen/GenC.py @@ -1077,6 +1077,8 @@ def CreateLibraryPcdCode(Info, AutoGenC, AutoGenH, Pcd): if PcdItemType == TAB_PCDS_FIXED_AT_BUILD and key in Info.ConstPcd: AutoGenH.Append('#define _PCD_VALUE_%s %s\n' %(TokenCName, Pcd.DefaultValue)) + else: + AutoGenH.Append('#define _PCD_VALUE_%s %s\n' %(TokenCName, Pcd.DefaultValue) -------------- After rebuilding my platform (ArmPlatformPkg/ArmVExpressPkg/ArmVExpress-RTSM-A9x4.dsc) in RELEASE build, here is the result: -------------- bl ArmDisableFiq .LVL19: .loc 1 215 0 ldr r0, .L25+4 bl ArmWriteVBar -------------- We can now see the dead code has been removed. In term of size: Before: FVMAIN_SEC [5%Full] 524288 total, 27232 used, 497056 free FVMAIN_COMPACT [18%Full] 2621440 total, 481760 used, 2139680 free FVMAIN [99%Full] 1161088 total, 1161056 used, 32 free After: FVMAIN_SEC [5%Full] 524288 total, 27232 used, 497056 free FVMAIN_COMPACT [18%Full] 2621440 total, 478632 used, 2142808 free FVMAIN [99%Full] 1154432 total, 1154400 used, 32 free So, it saved 6656 bytes in the non-compressed FV. ------------------------------------------------------------------------------ CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments & Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk_______________________________________________ edk2-devel mailing list [email protected]<mailto:[email protected]> https://lists.sourceforge.net/lists/listinfo/edk2-devel -- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2557590 ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2548782 ------------------------------------------------------------------------------ CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments & Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk _______________________________________________ edk2-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/edk2-devel
