Re: Potential GSoC proposal: Reduce the speed gap between 'ghc -c' and 'ghc --make'

Simon Marlow Wed, 25 Apr 2012 01:01:50 -0700

On 25/04/2012 08:57, Simon Marlow wrote:

On 25/04/2012 03:17, Mikhail Glushenkov wrote:

Hello Simon,


Sorry for the delay.

On Tue, Apr 10, 2012 at 1:03 PM, Simon Marlow<marlo...@gmail.com> wrote:

Questions:

Would implementing this optimisation be a worthwhile/realistic GSoC
project?
What are other potential ways to bring 'ghc -c' performance up to par
with 'ghc --make'?



My guess is that this won't have a significant impact on ghc -c compile
times.

The advantage of squashing the .hi files for a package together is
that they
could share a string table, which would save a bit of space and time,
but I
think the time saved is small compared to the cost of deserialising and
typechecking the declarations from the interface, which still has to be
done. In fact it might make things worse, if the string table for the
whole
base package is larger than the individual tables that would be read
from
.hi files. I don't think mmap() will buy very much over the current
scheme
of just reading the file into a ByteArray.


Thank you for the answer.
I'll be working on another project during the summer, but I'm still
interested in making interface files load faster.

The idea that I currently like the most is to make it possible to save
and load objects in the "GHC heap format". That way, deserialisation
could be done with a simple fread() and a fast pointer fixup pass,
which would hopefully make running many 'ghc -c' processes as fast as
a single 'ghc --make'. This trick is commonly employed in the games
industry to speed-up load times [1]. Given that Haskell is a
garbage-collected language, the implementation will be trickier than
in C++ and will have to be done on the RTS level.

Is this a good idea? How hard it would be to implement this optimisation?


I believe OCaml does something like this.

I think the main difficulty is that the data structures in the heap are
not the same every time, because we allocate unique identifiers
sequentially as each Name is created. So to make this work you would
have to make Names globally unique. Maybe using a 64-bit hash instead of
the sequentially-allocated uniques would work, but that would entail
quite a performance hit on 32-bit platforms (GHC uses IntMap everywhere
with Unique as the key).

On top of this there will be a *lot* of other complications (e.g.
handling sharing well, mapping info pointers somehow). Personally I
think it's at best very ambitious, and at worst not at all practical.

Oh, I also meant to add: the best thing we could do initially is toprofile GHC and see if there are improvements that could be made in the.hi file deserialisation/typechecking.


Cheers,
        Simon


Cheers,
Simon

Another idea (that I like less) is to implement a "build server" mode
for GHC. That way, instead of a single 'ghc --make' we could run
several ghc build servers in parallel. However, Evan Laforge's efforts
in this direction didn't bring the expected speedup. Perhaps it's
possible to improve on his work.

[1]
http://www.gamasutra.com/view/feature/132376/delicious_data_baking.php?print=1



_______________________________________________
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Re: Potential GSoC proposal: Reduce the speed gap between 'ghc -c' and 'ghc --make'

Reply via email to