#3571: Bizzarely bloated binaries
-----------------------------+----------------------------------------------
Reporter:  guest             |          Owner:                  
    Type:  bug               |         Status:  new             
Priority:  normal            |      Component:  Compiler        
 Version:  6.10.4            |       Severity:  normal          
Keywords:                    |       Testcase:                  
      Os:  Unknown/Multiple  |   Architecture:  Unknown/Multiple
-----------------------------+----------------------------------------------
 Compiling a trivial test program:

 {{{
 module Main where

 main = print "Hello World"
 }}}

 Using GHC 6.10.4 produces a VERY suspicious PE file. (NB: this applies to
 DLL as well as EXE output).

 The two problems that I have observed are:

  1) The PE always contains a .stab and .stabstr section totalling 0x2A00
 of debug data. Looking at the contents of stabstr, this appears to
 originate from a libffi object file. Perhaps we could disable stabs when
 building libffi to remove this bloat from output binaries.

  2) The PE contains *A LOT* of trailing junk. My hello world program is
 691K, and the PE contains 0x4FAFC = 318K of data which doesn't live in any
 section! Trimming this data appears to have no effect on the correctness
 of the program! The amount of junk grows proportionally to the amount of
 real code and data - I have observed e.g. 18Mb DLLs of which 9Mb are
 trailing junk.

 To repeat: we could potentially *halve* GHC binary sizes by fixing this
 linker behaviour.

 I'm not sure where exactly the fault lies - whether it is a GHC problem or
 some bug in Ld.

 To test trimming your executables and DLLs, you can use this utility I
 whipped up. Usage is "trimpe <file1> ... <fileN>". It will trim useless
 data from the end of the files in place:

 {{{
 {-# LANGUAGE ScopedTypeVariables #-}
 module Main (main) where

 import Control.Monad

 import Data.Binary
 import Data.Binary.Get

 import qualified Data.ByteString.Lazy as ByteString
 import Data.Word

 import System.Environment

 import Debug.Trace


 assertM :: Monad m => Bool -> m ()
 assertM True  = return ()
 assertM False = fail "assertM"

 newtype PEImageLength = PEImageLength Word32

 -- http://www.microsoft.com/whdc/system/platform/firmware/PECOFF.mspx
 instance Binary PEImageLength where
     get = do
         -- Skip the MS DOS stub
         skip 0x3c
         pe_sig_offset <- getWord32le
         -- Skip to the PE signature
         skip (fromIntegral pe_sig_offset - (0x4 + 0x3c))
         -- Read the PE signature itself
         -- NB: this will always be the string "PE\0\0"
         _sig <- getWord32le
         assertM (_sig == 0x00004550)
         -- Read COFF file header
         _machine <- getWord16le
         _no_of_sects <- getWord16le
         _time_date_stamp <- getWord32le
         _ptr_to_sym_tab <- getWord32le
         _no_of_syms <- getWord32le
         _size_of_opt_header <- getWord16le
         assertM (_size_of_opt_header /= 0)
         _characteristics <- getWord16le
         -- Read the "optional" header
         magic <- getWord16le
         let pe32plus = magic == 0x20B
         _maj_link_ver :: Word8 <- get
         _min_link_ver :: Word8 <- get
         _size_of_code <- getWord32le
         _size_of_init_data <- getWord32le
         _size_of_uninit_data <- getWord32le
         _addr_of_entry_point <- getWord32le
         _base_of_code <- getWord32le
         when (not pe32plus) $ do _base_of_data <- getWord32le; return ()
         -- Read the optional header Windows fields
         if pe32plus
          then do _image_base <- getWord64le; return ()
          else do _image_base <- getWord32le; return ()
         _sect_alignment <- getWord32le
         _file_alignment <- getWord32le
         _maj_os_version <- getWord16le
         _min_os_version <- getWord16le
         _maj_image_version <- getWord16le
         _min_image_version <- getWord16le
         _maj_subsys_version <- getWord16le
         _min_subsys_version <- getWord16le
         _win32_version <- getWord32le
         size_of_image <- getWord32le
         -- There is more stuff later, but I simply don't care about it
         -- NB: we could trim a little more agressively if we interpreted
         -- the sections as well...
         return $ PEImageLength size_of_image

     put = error "Binary PEImageLength: put"


 main :: IO ()
 main = do
     files <- getArgs
     forM_ files trimPEToImageSize

 trimPEToImageSize :: FilePath -> IO ()
 trimPEToImageSize file = do
     putStrLn $ file
     pe_contents <- ByteString.readFile file
     let PEImageLength image_size = decode pe_contents

     -- Force the file to close so that the write may succeed
     (ByteString.last pe_contents) `seq` return ()

     when (ByteString.length pe_contents > fromIntegral image_size) $ do
         putStrLn $ "* Trimming to image size (" ++ show image_size ++ ")"
         let pe_contents' = ByteString.take (fromIntegral image_size)
 pe_contents
         ByteString.writeFile file pe_contents'
 }}}

-- 
Ticket URL: <http://hackage.haskell.org/trac/ghc/ticket/3571>
GHC <http://www.haskell.org/ghc/>
The Glasgow Haskell Compiler
_______________________________________________
Glasgow-haskell-bugs mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs

Reply via email to