On Wed, Dec 14, 2016 at 8:21 AM, Reid Barton <rwbar...@gmail.com> wrote: > On Tue, Dec 13, 2016 at 1:21 PM, Evan Laforge <qdun...@gmail.com> wrote: > GHCi definitely needs to load some .hi files of your dependencies. > Your .hi files contain the types of your functions, needed to type > check expressions that use them. Let's say the type of one of your > functions involves ByteString. Then GHCi has to read the interface > file that defines ByteString, so that there is something in the > compiler for the type of your function to refer to.
Right, that makes sense. When I enable verbose logging, I see that in the upsweep phase it collects the imports of all of the transitively loaded modules. I assume it loads all the local .hi files, and then it also has to load the package dependency .hi files (--show-iface also shows a "package dependencies" section). I can't tell if it does that lazily, but it would make sense because surely I'm not using every single module exported from every single package. Certainly packages themselves can be loaded lazily, I frequently see ghci wait until I try to evaluate an expression to link in a bunch of external packages. > I'm not sure how to predict what exact set of .hi files GHCi will need > to load, but you could run your program under strace (or equivalent) > to see which .hi files it is loading. Then I would guess the expansion > factor when converting into the compiler's internal types is maybe > around 10x. However there's also some kind of lazy loading of .hi > files, and I'm not sure how that works or what granularity it has. I guess it would be dtrace on OS X, I'll look into it and see what I can learn. Then I can divide the size of the loaded .hi files by the increase in memory size and see what the ratio actually is. > By the way, you can use `ghc --show-iface` to examine .hi files > manually, which might be illuminating. That is pretty interesting, thanks. There's quite a lot of stuff in there, including some I didn't expect, like apparently lots of Show instance implementations for concrete types: bac9698d086d969aebee0847bf123997 $s$fShow(,)_$s$fShow(,)_$cshowList :: [(Writable, SaveFile)] -> ShowS In this case, both Writable and SaveFile are defined elsewhere, but I do show a list of them in that module so maybe the instances get inlined in here? But it's not a crazy amount of stuff, and I wouldn't expect it, since the .hi files themselves are not unreasonably large. >> I do build dynamically, since it's the only option nowadays to load .o >> files, but I guess what you mean is link the application as a shared >> library, and then link it to the Main module for the app, and pass it >> to GHC.parseDynamicFlags for the REPL? That's a good idea. But I'd >> still be loading all those .hi files, and if the majority of the >> memory use is actually from those, it might not help, right? > > I'm pretty sure the old way of linking your program statically, which > will cause the RTS to use its own linker to load .o files, is still > supposed to work. It has the same limitations it has always had, of > course. The new thing is that you need to build dynamically in order > to link object files into the ghc compiler itself; but that's just > because the ghc binary shipped in the binary distribution was built > dynamically; this isn't a constraint on your own GHC API use. (And you > can choose to build ghc statically, too. Windows builds still work > that way.) I see from my darcs history that I added -dynamic to all builds except profiling back in July 2014, I think after upgrading to 7.8. From the comment, I did that because otherwise ghci wouldn't load the .o files. And I remember lots of talk on trac around 7.8 about finally abandoning the home-grown linker. This is on OS X, so maybe it's platform dependent. > I really just meant building your executable dynamically, i.e., with > -dynamic. If the code size is a small proportion of the total memory > use then it won't make a big difference, as you say. However, I'm not > sure that is really the case considering that the GHC library itself > is already about 74 MB on-disk. In that case, I must already be doing that. But how would that work for my own application's binary? When I do otool -L I see that indeed all the cabal libraries like libHSbase and libHSghc are dynamic libraries, so presumably those will be shared across the whole OS. But application's binary is linked via 'ghc -dynamic -package=... A.o B.o C.o etc'. The .o files are built with -dynamic and I assume ghci itself uses the OS's loader for them, but they seem to be linked into the binary in the traditional static way. It's confusing to me because traditionally -dynamic is a link only flag, but ghc also uses it for building .o files... I assume because of the ghci loading thing. I always assumed it used the OS's low level shared object loading, but not the whole dynamic library mechanism. > I'm not sure why you are looking at the GHC.Stats.currentBytesUsed > number; be aware that it only measures the size of the GCed heap. Many > things that contribute to the total memory usage of your program (such > as its code size, or anything allocated by malloc or mmap) will not > show up there. I just picked it out of GCStats as having a promising looking name. It's the one the goes up while the GHC API is loading its modules, so I assumed it was the most useful one. currentBytesUsed reports 200mb, but the system process viewer shows 350mb, so clearly some isn't being counted. But that looks like 2-space GC overhead, and indeed if I do +RTS -c, the system usage goes down to 240mb while currentBytesUsed stays around 200mb (it goes up a bit actually). So perhaps most of the allocation is indeed in the GCed heap, and the extra space is mostly GC overhead. Does the GHC API use malloc or mmap internally? I wouldn't be surprised if .o files are loaded with mmap. Another thing that occurred to me, if the GC heap is really mostly loaded .hi files, then maybe I should increase the number of generations since most of the heap is immortal. Or maybe when the static regions stuff stabilizes, all the .hi data could go into a static region. I guess that might require non-trivial ghc hacking though. >> I don't fully understand the "have to load the entirety of your >> dependencies" part. If I'm using the same code linked into the main >> application, then isn't it a given that I'm loading everything in the >> application in the first place? > > Let me explain what I meant with an example. If I build a hello world > program statically, I get a 1.2M executable. Let's assume most of that > size comes from the base package. If I build the same hello world > program dynamically, I get an 18K executable dynamically linked > against an 11M base shared library! At runtime, the dynamic loader > will map that whole 11M file into my process's memory space. Whether > you want to count that as part of the space usage of your program is > up to you; the code segments will be shared between multiple > simultaneous instances of your program (or other programs compiled by > GHC), but if you only run one copy of your program at a time, that > doesn't help you. It certainly won't be counted by currentBytesUsed. > > The base library is composed of many individual .o files. When I > linked the hello world statically, the linker took only the .o files > that were actually needed for my program, which is why it was only > 1.2M when the base library is 11M. Your real program probably uses > most of base, but may have other dependencies that you use only a > small part of (lens?) Oh ok, that makes sense. In that case, because I'm dynamically linking cabal packages, then certainly I'm already getting that sharing. I was mostly concerned with the code from my program itself. The REPL is only loading about 255 of the 401 local .o files, and presumably if I link those 401 modules into a local dynlib, and then link that to both the Main module and have the GHC API load it, then I'd also share the local code. Since my program uses all of its own code kind of by definition, it's loaded no matter what, even if the REPL doesn't need all of it. _______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/glasgow-haskell-users