Hey all, We are happy to announce the 17 projects that have been accepted to participate in Google Summer of Code 2018 for the Haskell.org project.
We would like to thank Google for organizing the program, all students who applied for the quality proposals of course the mentors for volunteering to guide the projects! Without further ado, here are the accepted projects: - Visual Tools and Bindings for Debugging in Code World - Help Hadrian - Add support for deprecating exports - Hi Haddock - Improving the GHC code generator - Crucible: A Library for In-Memory Data Analysis in Haskell - Dependently Typed Core Replacement in GHC - Benchmarking graph libraries and optimising algebraic graphs - Improvements to GHC's compilation for conditional constructs. - Support for Multiple Public Libraries in a .cabal package - Functional test framework for the Haskell IDE Engine and Language Server Protocol Library - Native-Metaprogramming Reloaded - Format-Preserving YAML - Enhancing the Haskell Image Processing Library with State of the Art Algorithms - Making GHC Tooling friendly - Helping cabal new-build become just cabal build - Parallel Automatic Differentiation # Visual Tools and Bindings for Debugging in Code World Student: Krystal Maughan Mentors: Chris Smith, Gabriel Gonzalez Visual Debugging tools that will allow various ages to interact with and learn visually while tracing their bugs in Haskell. # Help Hadrian Student: Chitrak Raj Gupta Mentors: Andrey Mokhov, Moritz Angermann Current build systems such as `make` have a very complex structure and are difficult to understand or modify. Hadrian uses functional programming to implement abstractions to make codebase much more comprehensible. Build Rules are defined using Shake Library, and the results produced are much faster and scalable than current make based system. But the in-use implementation of Hadrian is still in development phase and not completely ready to be deployed. I believe that Hadrian will serve a huge assistance in increasing the productivity of Haskell developers. Therefore, the aim of my project will be to push Hadrian a few steps closer to deployment, so that the Haskell community can code with a bit more efficiency. A recent Pull Request by Alp Mestanogullary has implemented a basic rule for binary distribution. Also, I have been able to figure out multiple sources of errors causing validation failures, and my Pull Request has brought the number of failures down significantly. Hence, the major goals of my project will be to: 1. Achieve ghc-quake milestone that is listed in Hadrian. 2. Implement missing features in Hadrian. 3. Build a more comprehensive documentation of Hadrian. # Add support for deprecating exports Student: alanas Mentors: Matthew Pickering, Erik de Castro Lopo Add support of deprecation pragmas within module exports. This would ease the transition between different versions of the software by warning the developers that the functions/types/classes/constructors/modules that they are using are deprecated. # Hi Haddock Student: Simon Jakobi Mentors: Herbert Valerio Riedel, Alex Biehl A long-standing issue with Haskell's documentation tool Haddock is that it needs to effectively re-perform a large part of the parse/template-haskell/typecheck compilation pipeline in order to extract the necessary information from Haskell source for generating rendered Haddock documentation. This makes Haddock generation a costly operation, and makes for a poor developer experience. An equally long-standing suggestion to address this issue is to have GHC include enough information in the generated `.hi` interface files in order to avoid Haddock having to duplicate that work. This would pave the way for following use-cases and/or have the following benefits: 1. Significantly speed up Haddock generation by avoiding redundant work. 2. On-the-fly/lazy after-the-fact Haddock generation in cabal new-haddock and stack haddock for already built/installed Cabal library packages. 3. Add native support for a :doc command in GHCi's REPL and editor tooling (ghc-mod/HIE) similar to the one available in other languages (c.f. the Idris REPL or the Python REPL) 4. Allow downstream tooling like Hoogle or Hayoo! to index documentation right from interface files. 5. Simplify Haddock's code base. # Improving the GHC code generator Student: Abhiroop Sarkar Mentors: Carter Schonwald, Ben Gamari This project attempts to improve the native code generator of GHC by adding support for Intel AVX and SSE SIMD instructions. This support would enable GHC to expose a bunch of vector primitive operations, which can be utilized to by various high performance and scientific computing libraries of the Haskell ecosystem to parallelize their code for free. # Crucible: A Library for In-Memory Data Analysis in Haskell Student: Gagandeep Bhatia Mentors: Marco Zocca, Andika D. Riyandi Note: this project was slightly adjusted from its proposed form after some discussion with the mentors and it will have a stronger focus on improving existing libraries. A typical workflow in interactive data analysis consists of: - Loading data (e.g. a CSV on disk) - Transforming the data - Various data processing stages - Storing the result in some form (e.g. in a database). The goal of this project is to provide a unified and idiomatic Haskell way of carrying out these tasks. Informally, you can think of "dplyr"/"tidyr" from the R ecosystem, but type safe. This project aims to provide a library with the following features: - An efficient data structure for possibly larger-than-memory tabular data. The Frames library is notable prior work, and this project may build on top of it (namely, by extending its functionality for generating types from stored data). - A set of functions to "tidy"/clean the data to bring it to a form fit for further analysis, e.g. splitting one column to multiple columns ("spread") or vice versa ("gather"). - A DSL for performing a representative set of relational operations e.g. filtering/aggregation. # Dependently Typed Core Replacement in GHC Student: Ningning Xie Mentors: Richard Eisenberg In recent years, several works (Weirich et al., 2017; Eisenberg, 2016; Gundry, 2013) have proposed to integrate dependent types into Haskell. However, compatibility with existing GHC features makes adding full-fledged dependent types into GHC very difficult. Fortunately, GHC has many phases underneath so it is possible to change one intermediate language without affecting the user experience, as steps towards dependent Haskell. The goal of this proposal is the replacement of GHC's core language with a dependently-typed variant. # Benchmarking graph libraries and optimising algebraic graphs Student: Alexandre Moine Mentors: Andrey Mokhov, Alois Cochard A graph represents a key structure in computer science and they are known to be difficult to work with in functional programming languages. Several libraries are being implemented to create and process graphs in Haskell, each of them using different graph representation: Data.Graph from containers, fgl, hash-graph and alga. Due to their differences and the lack of a common benchmark, it is not easy for a new user to select the one that will best fit their project. The new approach of alga seems particularly interesting since the way it deals with graphs is based on tangible mathematical results. Still, it is not very user friendly and it lacks some important features like widely-used algorithms or edge labels. Therefore, I propose to develop a benchmarking suite that will establish a reference benchmark for these libraries, as well as to enhance alga's capabilities. # Improvements to GHC's compilation for conditional constructs. Student: Andreas Klebinger Mentors: José Calderón, Joachim Breitner, Ben Gamari While GHC is state of the art in many respects compilation of conditional constructs has fallen behind projects like Clang/GCC. I intend to rectify this by working on the following tasks: - Implement cmov support for Cmm - Use cmov to improve simple branching code - Use lookup tables over jump tables for value selection when useful. - Enable expression of three way branching on values in Cmm code. - Improve placement of stack adjustments and checks. # Support for Multiple Public Libraries in a .cabal package Student: Francesco Gazzetta (@fgaz) Mentors: Mikhail Glushenkov, Edward Yang Large scale haskell projects tend to have a problem with lockstep distribution of packages (especially backpack projects, being extremely granular). The unit of distribution (package) coincides with the buildable unit of code (library), and consequently each library of such an ecosystem (ex. amazonka) requires duplicate package metadata (and tests, benchmarks...). This project aims to separate these two units by introducing multiple libraries in a single cabal package. This proposal is based on <https://github.com/haskell/cabal/issues/4206> by ezyang. # Functional test framework for the Haskell IDE Engine and Language # Server Protocol Library Student: Luke Lau Mentors: Alan Zimmerman The Haskell IDE Engine is a Haskell backend for IDEs, which utilises the Language Server Protocol to communicate between clients and servers. This projects aims to create a test framework that can describe a scenario between an LSP client and server from start to finish, so that functional tests may be written for the IDE engine. If time permits, this may be expanded to be language agnostic or provide a set of compliance tests against the LSP specification. # Native-Metaprogramming Reloaded Student: Shayan Najd Mentors: Ben Gamari, Alan Zimmerman The goal is to continue on an ongoing work, utilising the Trees that Grow technique, to introduce native-metaprogramming in GHC. Native-metaprogramming is a form of metaprogramming where a metalanguage's own infrastructure is directly employed to generate and manipulate object programs. It begins by creating a single abstract syntax tree (AST) which can serve a purpose similar to what is currently served by Template Haskell (TH), and the front-end AST inside GHC (HsSyn). Meta-programs could then leverage, much more directly, the machinery implemented in GHC to process Haskell programs. This work can also possibly integrate with Alan Zimmerman's work on compiler annotations in GHC, and enable a better IDE support. # Format-Preserving YAML Student: Wisnu Adi Nurcahyo Mentors: Tom Sydney Kerckhove, Jasper Van der Jeugt Sometime Stack (The Haskell Tool Stack) ask us to add an extra dependency manually. Suppose that we use the latest Hakyll that needs a `pandoc-citeproc-0.13` which is missing in the latest stable Stack LTS. Stack asks us to add the extra dependency to solve this problem. Wouldn't it be nice if Stack could add the extra dependency by itself? # Enhancing the Haskell Image Processing Library with State of the Art # Algorithms Student: khilanravani Mentors: Alp Mestanogullari The project proposed here aims to implement different classes of Image processing algorithms using Haskell and incorporate the same to the existing code base of Haskell Image Processing (HIP) package. The algorithms that I plan to incorporate in the HIP package have vast applications in actual problems in image processing. Including these algorithms to the existing code base would help more and more users to really use Haskell while working on some computer vision problems and this would make Haskell (kind of) ahead in the race of with functional programming languages such as Elm or Clojure (since their image processing libraries are pretty naive). In this way, this project can substantially benefit the Haskell organization as well as the open source community. Some of the algorithms proposed here include the famous Canny edge detection, Floyd - Steinberg (Dithering) along with other popular tools used in computer vision problems. # Making GHC Tooling friendly Student: Zubin Duggal Mentors: Ben Gamari, Gershom Bazerman, Joachim Breitner GHC builds up a wealth of information about Haskell source as it compiles it, but throws all of it away when it's done. Any external tools that need to work with Haskell source need to parse, typecheck and rename files all over again. This means Haskell tooling is slow and has to rely on hacks to extract information from GHC. Allowing GHC to dump this information to disk would simplify and speed up tooling significantly, leading to a much richer and productive Haskell developer experience. # Helping cabal new-build become just cabal build Student: typedrat Mentors: Herbert Valerio Riedel Mikhail Glushenkov While much of the functionality required to use the `new-*` commands has already been implemented, there are not-insignificant parts of the design that was created last year that remain unrealized. By completing more of this design, I plan to help the `new-` prefix go away and to allow this safer, cleaner system to replace old-style cabal usage fully by rounding off the unfinished edges of the current proposal. # Parallel Automatic Differentiation Student: Andrew Knapp Mentors: Trevor L. McDonell, Edward Kmett, Alois Cochard Automatic Differentation (AD) is a technique for computing derivatives of numerical functions that does not use symbolic differentiation or finite-difference approximation. AD is used in a wide variety of fields, such as machine learning, optimization, quantitative finance, and physics, and the productivity boost generated by parallel AD has played a large role in recent advances in deep learning. The goal of this project is to implement parallel AD in Haskell using the `accelerate` library. If successful, the project will provide an asymptotic speedup over current implementations for many functions of practical interest, stress-test a key foundation of the Haskell numerical infrastructure, and provide a greatly improved key piece of infrastructure for three of the remaining areas where Haskell's ecosystem is immature. _______________________________________________ Haskell mailing list Haskell@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell