Hello Here is the latest OCaml Weekly News, for the week of May 19 to 26, 2020.
Table of Contents ───────────────── Hannes Mehnert interview about MirageOS and OCaml by Evrone A dynamic checker for detecting naked pointers ANN: Releases of ringo Solidity parser in OCaml with Menhir Browsing source with merlin (and tuareg) the right way? New release of Cucumber ML 1.0.3 New OCaml books? Integer division behaviour New release of tablecloth Language abstractions and scheduling techniques for efficient execution of parallel algorithms on multicore hardware Release 1.2 of HoCL Other OCaml News Old CWN Hannes Mehnert interview about MirageOS and OCaml by Evrone ═══════════════════════════════════════════════════════════ Archive: [https://discuss.ocaml.org/t/hannes-mehnert-interview-about-mirageos-and-ocaml-by-evrone/5784/1] Elizabeth Lvova announced ───────────────────────── [https://evrone.com/hannes-mehnert-interview] Guillaume Munch-Maccagnoni then asked ───────────────────────────────────── Thank you @elizabethlvova for this link. Hi @hannes, and other MirageOS developers if they know the answer. I am curious about the soft real-time applications. • What kind of latency do you target, and what kind of latency does OCaml allows you to achieve? Are there concrete evaluations about it in the context of MirageOS? (Bonus internet points if they are public so as to be referenced in a paper, that would be very helpful to me!) • I have learnt on this discuss that low-latency can be obtained in OCaml by writing in a special style where you promote very little. Do you sometimes have to pay attention to your allocation patterns when you program for MirageOS? Have you ever had to profile an application for latency, and fix it by changing allocation patterns? Hannes Mehnert replied ────────────────────── I am not sure whether there's anyone focussing on low-latency MirageOS unikernels. My goal is to first get robust and sustainable infrastructure. I have been playing a bit with an old version of statmemprof to figure out allocation profiles (and landmarks for profiling code), but I am not aware of any in-depth allocation analysis. The closest I am aware of is httpaf's motivational benchmarks [https://github.com/inhabitedtype/httpaf#performance]. Also [https://github.com/mirage/mirage/pull/968] in respect to our IP stack, but I still feel there's room for improvement (such as using String/Byte instead of Bigarray; avoid allocation of small structures when sending data). Following several questions, Anil Madhavapeddy replied ────────────────────────────────────────────────────── Bikal Lem said ╌╌╌╌╌╌╌╌╌╌╌╌╌╌ It is interesting you mentioned this. Isn’t the usage of bigarray more efficient than String/bytes? I think httpaf uses bigstringaf and faraday which seems to pervasively use bigarray as its primary buffer data structure. Isn’t this a performant choice? This is a good question, and it's helpful to understand what each datastructure is backed by, and what operations are inefficient. • `Bigarray' is a pointer to externally allocated memory of arbitrary length. It supports creating smaller views of the same memory without copying it, which is implemented at the [OCaml runtime level]. Accessing data within bigarrays is fast thanks to some compiler primitives which allow for endian-neutral parsing and serialisation, implemented by [ocplib-endian]. • Bigarrays are extremely convenient for network IO, since they support everything needed for minimal copying of data from the OS. You can exchange memory pages directly from the OS into the OCaml heap, and process them. Unfortunately, one operation is critically slow here – creating a substring. Bigarray's provenance was originally to interop better with Fortran-style HPC code, where the size and dimensionality of arrays is generally large. For IO, we just want really speedy 1-dimension arrays, and in this usecase Bigarray substring creation is very slow due to the underlying reference counting. Thus [cstruct] was born, which keep a single underlying Bigarray structure and allocates [small OCaml records] on the minor heap for subviews. These are cheap to create and GC, and the underlying data is not copied unless requested. • Strings are immutable and sit in the OCaml heap, and require a data copy from the outside world into them. Under some circumstances (usually small allocations) they can be more performant. • Buffers are a resizable String, and efficient if you need to concatenate lots of data of unknown size. So the final answer, as with many systems performance problems about what is "efficient" depends on your allocation patterns. For transmitting data, there is often a number of small pieces of data that are combined onto a set of pages for the write path. In this case, a hybrid of "in-heap" assembly using small strings followed by blitting into a Bigarray is reasonable. For reading, parsing directly from a Bigarray into a cstruct works well. [OCaml runtime level] https://github.com/ocaml/ocaml/blob/trunk/runtime/caml/bigarray.h#L78 [ocplib-endian] https://github.com/OCamlPro/ocplib-endian [cstruct] https://github.com/mirage/ocaml-cstruct [small OCaml records] https://github.com/mirage/ocaml-cstruct/blob/master/lib/cstruct.mli#L143 Yotam Barnoy said ╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌ Compared to string/byte, Bigarray doesn’t have much of an advantage. Allocation is always relatively expensive, and accessing it requires going through the C API, This is basically all incorrect; please see above. Accessing Bigarrays can be done via builtin compiler primitives that make it fast. And the point of using them is to avoid multiple small allocations, especially on the read path. Guillaume Munch-Maccagnoni asked ╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌ • What kind of latency do you target, and what kind of latency does OCaml allows you to achieve? Are there concrete evaluations about it in the context of MirageOS? (Bonus internet points if they are public so as to be referenced in a paper, that would be very helpful to me!) The basic approach to low latency OCaml hasn't really changed much in the last few decades. You just need to minimise allocation to maximise GC throughput, and OCaml makes it fairly easy to write that sort of low level code. Two papers that might be helpful: • ["Melange: Towards a functional internet"], EuroSys 2007. Contains a latency analysis of an SSH and DNS server _vs_ C equivalents, and some techniques on writing low-latency protocol parsers. These days, we do roughly the same thing with ppx's and cstructs, without the DSL in the way. • ["Jitsu: Just-in-Time Summoning of Unikernel;s"], NSDI 2015. This shows the benefits of whole-system latency control – you can mask latency by doing some operations concurrently, which is easy to do in unikernels and hard in a conventional OS. We've never really built systems in the "soft realtime" sense so far – for example no video transmission system or isochronous Bluetooth implementations. Internet protocols are very resilient to variable latency, although of course we want to keep things as low as possible. I've been looking into multipath multicast video transmission in Mirage recently due to the current work-at-home situation, so that might change soon depending on how it goes :slight_smile: One thing that has changed in the past decade is the [steadily improving latency profile] of the OCaml GC, which has only been improving thanks to @damiendoligez's steady work. That has let us get away with not directly addressing latency much in Mirage itself, as every upgrade of the compiler is a pleasant improvement. ["Melange: Towards a functional internet"] https://www.tjd.phlegethon.org/words/eurosys07-melange.pdf ["Jitsu: Just-in-Time Summoning of Unikernel;s"] https://www.usenix.org/system/files/conference/nsdi15/nsdi15-paper-madhavapeddy.pdf [steadily improving latency profile] https://blog.janestreet.com/building-a-lower-latency-gc/ Calascibetta Romain said ╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌ I just would like to add a _pro_ about `bigarray', due to the fact that a `bigarray' can not move in your heap, we have the ability to release the runtime lock for some computations such as _hash algorithms_ as `digestif' does: [https://github.com/mirage/digestif/pull/70] About MirageOS, we currently mostly use [`cstruct'] which has an other difference with `bigarray', the underlying record. Such design is to be more efficient when we do a `sub' operation as @ivg said here: [https://discuss.ocaml.org/t/working-with-a-huge-data-chunks/3955/10?u=dinosaure] However, the question to choose `Bytes.t' or `Cstruct.t' (or `Bigstring.t' ) is a bit hard and it really depends on your context - and, as @xavierleroy said :slight_smile: : > Mirage people don’t seem to care, as they allocate small bigarrays like crazy. And indeed, @xavierleroy is right that we allocate like crazy, with the caveat that this only really happens on the transmission path of most protocols. Reads tend to go through a more minimal copy discipline. We certainly do care about this, but it has to be fixed upstream in OCaml as we have reached the limits of what we can practically do with Bigarray – I am hoping that multicore OCaml is the perfect time to [unify all these IO approaches] in that direction as part of that effort. Mirage will benefit from whatever happens there eventually. [`cstruct'] https://github.com/mirage/ocaml-cstruct [unify all these IO approaches] https://discuss.ocaml.org/t/ann-a-dynamic-checker-for-detecting-naked-pointers/5805/15?u=avsm A dynamic checker for detecting naked pointers ══════════════════════════════════════════════ Archive: [https://discuss.ocaml.org/t/ann-a-dynamic-checker-for-detecting-naked-pointers/5805/1] KC Sivaramakrishnan announced ───────────────────────────── We're happy to release an OCaml compiler switch for dynamically detecting naked pointers in the code. Naked pointers in OCaml ╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌ A naked pointer is a pointer outside the OCaml heap without a valid header. A header outside the heap is said to be valid if it is colored black. OCaml does [permit naked pointers] to word-aligned addresses. However, the presence of naked pointers incurs overhead in the garbage collector (GC). Whenever the GC intends to follow a pointer, it must check that the pointer is indeed in the OCaml heap. The GC consults a page table that maintains the list of pages currently used by the heap and only follows the pointer if it belongs to one of the pages. As you can imagine, this adds some overhead in the GC. For the multicore GC, maintaining a page table that remains consistent when multiple domains are allocating and running GC in parallel would necessitate some synchronization around the page table for reading and writing to it. It is quite likely that this cost will be prohibitive. Luckily, OCaml already has a `no-naked-pointer' mode where the compiler *assumes* that the code does not have naked pointers, and hence, does not consult the page table for following pointers during GC ([except `Closure_tag' objects]). The `no-naked-pointer' mode is a configure-time option, enabled by configuring the compiler with `--disable-naked-pointers'. Multicore OCaml compiler does not use a page table in its implementation currently. [permit naked pointers] https://caml.inria.fr/pub/docs/manual-ocaml/intfc.html#ss:c-outside-head [except `Closure_tag' objects] https://github.com/ocaml/ocaml/pull/8984 Dynamic Check for naked pointers ╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌ With the aim of migrating to `no-naked-pointer' mode as the default in future releases of OCaml, eventually paving the way for upstreaming multicore support, we're happy to release a variant of OCaml 4.10.0 with a dynamic checker for the presence of naked pointers in the code. [OCaml PR#9534] has the discussion around this checker. This variant can be installed with: ┌──── │ $ opam update │ $ opam switch create 4.10.0+nnpcheck │ $ eval $(opam env) └──── Once the variant is installed, you can install your favorite libraries using `opam' and run your program to get a report of naked pointers. Let us look at an example. We know that `frama-c' has naked pointers. ┌──── │ $ opam install frama-c │ $ frama-c │ Out-of-heap pointer at 0x55fc1e2754d8 of value 0x55fc1e3a0cc0 has non-black head (tag=144) │ Out-of-heap pointer at 0x55fc1e275600 of value 0x55fc1e3a0cc0 has non-black head (tag=144) │ Out-of-heap pointer at 0x55fc1e2754d8 of value 0x55fc1e3a0cc0 has non-black head (tag=144) │ Out-of-heap pointer at 0x55fc1e275600 of value 0x55fc1e3a0cc0 has non-black head (tag=144) │ Out-of-heap pointer at 0x55fc1e2754d8 of value 0x55fc1e3a0cc0 has non-black head (tag=144) │ Out-of-heap pointer at 0x55fc1e275600 of value 0x55fc1e3a0cc0 has non-black head (tag=144) │ Out-of-heap pointer at 0x55fc1e2754d8 of value 0x55fc1e3a0cc0 has non-black head (tag=144) │ Out-of-heap pointer at 0x55fc1e275600 of value 0x55fc1e3a0cc0 has non-black head (tag=144) │ Out-of-heap pointer at 0x55fc1e2754d8 of value 0x55fc1e3a0cc0 has non-black head (tag=144) │ Out-of-heap pointer at 0x55fc1e275600 of value 0x55fc1e3a0cc0 has non-black head (tag=144) │ Out-of-heap pointer at 0x55fc1e2754d8 of value 0x55fc1e3a0cc0 has non-black head (tag=144) │ Out-of-heap pointer at 0x55fc1e275600 of value 0x55fc1e3a0cc0 has non-black head (tag=144) └──── The checker prints warnings to standard error with the address that contains the naked pointer, the naked pointer and the reason why the warning was raised. [OCaml PR#9534] https://github.com/ocaml/ocaml/pull/9534 Finding the sources ╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌ While the warnings are useful for indentifying that the program has naked pointer, it does not help with finding the source of the naked pointer in code. For this, we recommend the use of [`rr']. `rr' is record and replay framework that wraps around the familiar `gdb' interface. We can debug the error above as follows: ┌──── │ $ rr frama-c │ rr: Saving execution to trace directory ~/home/kc/.local/share/rr/frama-c-5'. │ Out-of-heap pointer at 0x55fc1e2754d8 of value 0x55fc1e3a0cc0 has non-black head (tag=144) │ Out-of-heap pointer at 0x55fc1e275600 of value 0x55fc1e3a0cc0 has non-black head (tag=144) │ Out-of-heap pointer at 0x55fc1e2754d8 of value 0x55fc1e3a0cc0 has non-black head (tag=144) │ Out-of-heap pointer at 0x55fc1e275600 of value 0x55fc1e3a0cc0 has non-black head (tag=144) │ Out-of-heap pointer at 0x55fc1e2754d8 of value 0x55fc1e3a0cc0 has non-black head (tag=144) │ Out-of-heap pointer at 0x55fc1e275600 of value 0x55fc1e3a0cc0 has non-black head (tag=144) │ Out-of-heap pointer at 0x55fc1e2754d8 of value 0x55fc1e3a0cc0 has non-black head (tag=144) │ Out-of-heap pointer at 0x55fc1e275600 of value 0x55fc1e3a0cc0 has non-black head (tag=144) │ Out-of-heap pointer at 0x55fc1e2754d8 of value 0x55fc1e3a0cc0 has non-black head (tag=144) │ Out-of-heap pointer at 0x55fc1e275600 of value 0x55fc1e3a0cc0 has non-black head (tag=144) │ Out-of-heap pointer at 0x55fc1e2754d8 of value 0x55fc1e3a0cc0 has non-black head (tag=144) │ Out-of-heap pointer at 0x55fc1e275600 of value 0x55fc1e3a0cc0 has non-black head (tag=144) │ $ rr replay │ (rr) watch *(value*)0x55fc1e2754d8 │ Hardware watchpoint 1: *(value*)0x55fc1e2754d8 │ (rr) c │ Continuing. │ │ Hardware watchpoint 1: *(value*)0x55fc1e2754d8 │ │ Old value = 1 │ New value = 94541327240384 │ 0x000055fc1dab48f8 in camlUnmarshal__entry () at src/libraries/datatype/unmarshal.ml:72 │ 72 src/libraries/datatype/unmarshal.ml: No such file or directory. └──── This corresponds to the naked pointer at [https://github.com/Frama-C/Frama-C-snapshot/blob/master/src/libraries/datatype/unmarshal.ml#L72]. [`rr'] https://github.com/mozilla/rr Fixing naked pointers ╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌ The recommended way of fixing naked pointers is to [wrap them in an OCaml object with `Custom_tag' or `Abstract_tag' (as appropriate)]. [wrap them in an OCaml object with `Custom_tag' or `Abstract_tag' (as appropriate)] https://caml.inria.fr/pub/docs/manual-ocaml/intfc.html#ss:c-outside-head Limitations ╌╌╌╌╌╌╌╌╌╌╌ The dynamic analysis only work on AMD64 backend with GCC and Clang. It has been known to work on Linux and MacOS. `rr' currently requires an Intel CPU with Nehalem (2010) or later microarchitecture. Credits ╌╌╌╌╌╌╌ The analysis was originally proposed by Mark Shinwell (@mshinwell). KC Sivaramakrishnan added ───────────────────────── As @gasche had mentioned earlier, the no-naked-pointers mode was already there in OCaml and it is known to work on all the platforms that OCaml was supported. Hence, it was a reasonable path to pursue for Multicore. The concurrent minor collector in Multicore OCaml uses the virtual address space trick, but only for the minor heap area. It needs contiguous 4GB reserved for 128 domains, each with max 16MB minor heap arena. This can be modified at compiler configure time. For comparison the minor heap is 2MB by default in OCaml and so 16MB should be quite enough. We hadn't considered this trick for the major heap in Multicore. However, given our experimental evaluation (see [paper]), we have chosen not to pursue concurrent minor collector for the initial version of multicore support to be upstreamed. The alternative stop-the-world parallel minor collector scales better and does not break the C FFI. The parallel minor collector does not need the virtual address space trick. Given that the space for the entire heap should be reserved, how would it work on 32-bit architectures, and does it have an impact on system tooling. Looking forward to reading @gadmm's RFC. [paper] https://arxiv.org/abs/2004.11663 Stephen Kell asked and Anil Madhavapeddy replied ──────────────────────────────────────────────── As someone with an interest in cross-language interop, I like naked pointers. So I’m interested in design choices that might make the “our heap or not?” check fast, and in reasons why impls might not go for them. Feel free to answer “read the paper”, but I was thinking you could do something like reserve a big contiguous chunk of VAS for OCaml heaps’ use, and then the test could be a simple shift and compare. Would that be viable? Please do note that this isn't a performance improvement for OCaml – this very much a correctness fix. The failure case is as follows: • a naked pointer is created using `malloc' on the C heap and held in the OCaml heap • the external region is `free''d, but the naked pointer is still held in some OCaml heap. • the GC ~malloc~s to expand, and that recently freed C memory becomes part of the OCaml heap • the GC then follows the naked pointer by treating it as an OCaml value, since the page table indicates that it is within the OCaml heap. However, the memory the naked pointer is aimed at is not necessarily a valid OCaml value as it was formerly a C pointer. • memory corruption ensues The only way to really avoid this is by only holding naked references to static or global C values, which is a pretty minority usecase. As @lpw25 notes, you can hold them safely by wrapping them in custom blocks, which is entirely safe as it gives the GC a reliable way to determining what's going on. As for the question about a contiguous VA, this should work fine on 64-bit, where you have the luxury of such use of the address space. I built a version of this a decade ago for OCaml/Xen in early Mirage, which you can find evaluated in the [HotCloud 2010 paper] (Figure 4). It's pretty straightforward, but the problems come from balancing external memory pressure (from C allocations) with the OCaml allocation. This can be adjusted with an obvious use of `sbrk' or `realloc' to grow or shrink the contiguous memory, while being careful to keep other memory allocations away from the OCaml area. The current strategy will need to be maintained for 32-bit architectures however, which are very much supported (e.g. armv7). For those, there is very little wiggle room to hold a contiguous VA and so the current multicore approach lets us preserve a unified memory representation. One observation I had when I read @stephenrkell's excellent essay is how strange our current memory allocation mechanisms are in operating systems. We have conflated cooperative scheduling across components with enforcing protection from mutually untrusted control flow in the same language. For example, we have the system C malloc competing with the OCaml GC which competes with the kernel memory allocator. I've been sketching out a possible solution in multicore OCaml towards this: • We move away from `Bigarray' to a specialised `Extvalue' that handles external pages in a separate region of memory. Bigarray currently offers too much functionality (subarrays and proxies) which slows it down due to dropping into the C FFI. • The `Extvalue' is backed by a bundled slab allocator that works in a contiguous region of memory, disjoint from the OCaml heap. • The compiler provides primitives for very fast translation of values in and out of the `Extvalue' (as it does currently for `Bigarray'). • C libraries linked in with OCaml also use this memory allocator for their own mallocs. This will require some trickery (static compilation or LD_PRELOAD initially), but it means that all the allocations associated with a particular "task" (from OCaml to C or Rust code) can be batched together. • This approach lets us improve multicore memory locality greatly, as every modern machine has significant NUMA effects (see this [FOSDEM 2013 talk]), and cooperatively allocate memory. It also leaves open the possibility of separate isolation mechanisms (such as ARM memory domains or Intel MPK) _across_ tasks in a large heap. Please note that the above is still only at the experimental stage as I'm still evaluating it, but it does have the advantage of degrading gracefully if the system malloc has to be used (e.g. if OCaml is embedded as a library, noone expects 10GBs gigabit levels of network performance). From an ecosystem perspective, I don't think anyone really wants to maintain the current hybrid world of a multitude of `Bigarray'-based overlays, such as cstruct or bigstring. [HotCloud 2010 paper] http://mort.io/publications/pdf/hotcloud10-lamp.pdf [FOSDEM 2013 talk] https://www.youtube.com/watch?v=Ss4pUbq09Lw ANN: Releases of ringo ══════════════════════ Archive: [https://discuss.ocaml.org/t/ann-releases-of-ringo/5605/2] Raphaël Proust announced ──────────────────────── Version 0.4 of `ringo' is now available in `opam'. This version includes bug-fixes, minor (sometimes breaking) interface and semantics improvements, and, most importantly, a `ringo-lwt' package. `ringo-lwt' provides wrapper for using caches in an Lwt-heavy application. Specifically, it provides a functor that transform a Ringo cache into a Ringo-lwt cache featuring: • `val find_or_replace : 'a t -> key -> (key -> 'a Lwt.t) -> 'a Lwt.t' which helps avoid race conditions, • _automatic cleanup_ by which promises that are rejected are removed from the table automatically. Additional functors for option (with automatic cleanup of `None') and result (with automatic cleanup of `Error') are also provided. Solidity parser in OCaml with Menhir ════════════════════════════════════ Archive: [https://sympa.inria.fr/sympa/arc/caml-list/2020-05/msg00026.html] David Declerck announced ──────────────────────── We just released a parser & printer for the Solidity language: [https://medium.com/dune-network/a-solidity-parser-in-ocaml-with-menhir-e1064f94e76b] Solidity is one of the most used languages for smart contracts, and popularized by the Ethereum blockchain. This work is a step towards native support of Solidity in the Dune Network blockchain, and developed in a partnership between Origin-Labs and OCamlPro. The library is released under LGPLv3 with Static Linking exception. Browsing source with merlin (and tuareg) the right way? ═══════════════════════════════════════════════════════ Archive: [https://discuss.ocaml.org/t/browsing-source-with-merlin-tuareg-the-right-way/5776/2] Luc_ML asked ──────────── Browsing the source of your own program and libraries and of other people's libraries is a key for being able to smoothly program and also to attract more people to OCaml. Am I the only one that find this it so archaic programming in OCaml with Emacs/Tuareg? (compared to other mainstream PLs IDE and (integrated) tooling). Can you share your (Emacs) OCaml IDE setup or give some advice? This should also be of interest for new comers to OCaml that may find IDE support neither easy nor fantastic. Anton Kochkov ───────────── You can check out: • [Visual Studio Code] + [OCaml platform plugin] - see [https://discuss.ocaml.org/t/ann-vscode-platform-plugin-0-5-0/5752] • Emacs + [lsp-mode] + [ocaml-lsp] - for now it's experimental though. [https://aws1.discourse-cdn.com/standard11/uploads/ocaml/original/2X/3/3dc0a6e4735273399a0c61a25806a6e8ac327ab6.png] [Visual Studio Code] https://code.visualstudio.com/ [OCaml platform plugin] https://marketplace.visualstudio.com/items?itemName=ocamllabs.ocaml-platform [lsp-mode] https://github.com/emacs-lsp/lsp-mode [ocaml-lsp] https://github.com/ocaml/ocaml-lsp New release of Cucumber ML 1.0.3 ════════════════════════════════ Archive: [https://discuss.ocaml.org/t/ann-new-release-of-cucumber-ml-1-0-3/5813/1] Christopher Yocum announced ─────────────────────────── I am pleased to announce the release of [Cucumber ML] 1.0.3. Cucumber ML is a library that brings [Behavior Driven Development] to OCaml via [Cucumber]. Essentially, Cucumber is a way to communicate using plain language between software development teams and non-developer stakeholders that can be turned into code to be executed. This release updates the underlying dependency on the gherkin language parser, [gherkin-c] up-to-date with the latest version of that library (7.0.4). This will deal with those pesky compile errors. Just a note here, that you will need to installed the gherkin parser as a shared object (aka a shared library) on your system for Cucumber ML to link against. [Cucumber ML] https://github.com/cucumber/cucumber.ml [Behavior Driven Development] https://en.wikipedia.org/wiki/Behavior-driven_development [Cucumber] https://docs.cucumber.io/ [gherkin-c] https://github.com/cucumber/gherkin-c Roadmap ╌╌╌╌╌╌╌ There are a bunch of things that I could be doing and here are a couple that I will be thinking about in the near future: • Releasing the library via OPAM for ease of install • A more flexible Reporting structure that user can extend via functors with some sensible defaults to choose from New OCaml books? ════════════════ Archive: [https://discuss.ocaml.org/t/new-ocaml-books/5789/5] Continuing this thread, Daniil Baturin announced ──────────────────────────────────────────────── I'm working on a free culture book. The preview is at [https://ocaml-book.baturin.org] and the source is at [https://github.com/dmbaturin/ocaml-book] It's under CC-BY-SA so it belongs to the community—it can be a living document that people can keep up to date even if original authors abandon it. It's also supposed to be a collaborative project, but almost no one is collaborating so far. ;) Integer division behaviour ══════════════════════════ Archive: [https://discuss.ocaml.org/t/integer-division-behaviour/5815/1] Daniil Baturin asked ──────────────────── Number theoretically correct integer division is supposed to work so that `(N / K) + (N mod K) = N'. I was very surprised to see that it's not how `(/) : int → int' works! ┌──── │ # 3 mod 2 ;; │ - : int = 1 └──── Now, two questions. What is the justification for this behaviour? And does anything provid real integer division? Gaëtan Gilbert corrected ──────────────────────── Surely you mean `((N / K) * K) + (N mod K) = N'? Aaron L. Zeng replied ───────────────────── I assume you meant to include an example with negative numbers? ┌──── │ # 3 mod 2;; │ - : int = 1 │ # (-3) mod 2;; │ - : int = -1 │ # 3 mod (-2);; │ - : int = 1 │ # (-3) mod (-2);; │ - : int = -1 └──── The `mod' operator always has the same sign as the numerator. I think this is for historical reasons, although I don't know whether to point the finger at C, or x86, or something even earlier. If you use Base, the `%' operator gives you the Euclidean modulo operator that I think you're looking for. Its result always has the same sign as the denominator. This operator is basically equivalent to: ┌──── │ let (%) x y = │ let z = x mod y in │ if z < 0 then z + y else z └──── threepwood also replied ─────────────────────── This is a "feature" in most programming languages and I think actually corresponds to the standard way division is implemented in the CPU itself (so it has little to do with OCaml). How this was allowed to become the standard I do not know. One thing is that I get the impression that people who are not familiar with number theory find the following result extremely counter-intuitive : `(-3) / 2 = -2' I believe it is because they think of integer division as an approximation of real division, rather than as being its own special thing, and from this perspective it makes no sense that making a number negative should change the result. They expect the identities that hold of real division (like `(a*b) / c = a * (b/c)') to also hold for integer division. (I say "they" not to belittle the perspective, I totally see where they are coming from.) But then if you have `(-3) / 2 = -1', you need to have `(-3) mod 2 = -1' to preserve the relation between `/' and `mod' that you mention (so you'll note that the relation does hold in this system). I tend to think that the behaviour where `mod' never returns anything negative, in addition to being what a mathematician would expect, is strictly more useful (for what I believe to be the typical use case of modulo over negative numbers in programming, which is indexing into a circular buffer). And I also think that you almost never divide negative numbers, so the useful behaviour for `mod' should have taken priority when deciding how all this works, and whether `/' is intuitive or not does not matter much in practice. But I have no idea who took that decision and whether such issues were even considered. Daniel Bünzli then said ─────────────────────── [This paper] which discusses various definitions of `div' and `mod' in programming languages may be of interest. [This paper] https://dl.acm.org/doi/pdf/10.1145/128861.128862 threepwood replied ────────────────── Thanks for this! So the one found in OCaml and most languages is T-division (for truncating) and the one I called more useful is E-division (for Euclidean) which the paper argues for. It says that T-division is found in Ada, that Lisp has two modulo operators, one that does T-division and one that does F-division (halfway between T and E, and works for the circular buffer case), and that Algol and Pascal break the relation between div and mod by doing T-division for div and E-division for mod (if I got it right). Interesting stuff. New release of tablecloth ═════════════════════════ Archive: [https://discuss.ocaml.org/t/ann-new-release-of-tablecloth/5818/1] Paul Biggar announced ───────────────────── I’ve just released a new version of [tablecloth] - an easy-to-use, comprehensive standard library that has the same API on all OCaml/ReasonML/Bucklescript platforms. 0.0.7 is a pretty decent release, including many new functions in the List, Array, Int, Float, Option, and Result modules, as well as the addition of a new Fun module and support for the latest version of bs-platform for the bucklescript version of tablecloth. See [the tablecloth github repo] for installation instructions, or read the full [changelog], or the [original announcement] of tablecloth for motivation. In addition, Dean Merchant and I have agreed to merge his [Standard] library into tablecloth. Dean has done a significant amount of the work in tablecloth since the original release, and we plan to release a version 0.0.8 after merging the two code bases together. Dean is now a maintainer of tablecloth. [tablecloth] https://github.com/darklang/tablecloth [the tablecloth github repo] https://github.com/darklang/tablecloth [changelog] https://github.com/darklang/tablecloth/blob/master/Changelog.md [original announcement] https://medium.com/darklang/tablecloth-a-new-standard-library-for-ocaml-reasonml-d29a73a557b1 [Standard] https://github.com/Dean177/reason-standard Language abstractions and scheduling techniques for efficient execution of parallel algorithms on multicore hardware ════════════════════════════════════════════════════════════════════════════════════════════════════════════════════ Archive: [https://discuss.ocaml.org/t/language-abstractions-and-scheduling-techniques-for-efficient-execution-of-parallel-algorithms-on-multicore-hardware/5822/1] Arthur Charguéraud announced ──────────────────────────── The Multicore OCaml team has made significant progress in the recent years. There now seems to be interest in working on the high-level parallelism constructs. Such constructs are also tightly connected to the problem of controlling the granularity of parallel tasks. I've been working on parallel constructs and granularity control from 2011 to 2019, together with Umut Acar and Mike Rainey. We published a number of papers, each of them coming with theoretical bounds, an implementation, and evaluation on state-of-the-art benchmark of parallel algorithms. While we mainly focused on C++ code, I speculate that nearly all of our ideas could be easily applied to Multicore OCaml. Porting these ideas would deliver what seems to be currently missing in Multicore OCaml for efficiently implementing a large class of parallel algorithms. Gabriel Scherer and François Pottier recently suggested to me that it appears timely to share these results with the OCaml community. I'll thus try to give an easily-accessible, OCaml-oriented introduction to the results that we have produced. Note, however, that most of the ideas presented would apply essentially to another other programming language that aims to support nested parallelism. I plan to cover the semantics of high-level parallelism constructs, to describe and argue for work-stealing scheduling, to present a number of tricks that are critical for efficiency, and to advertise for our modular, provably-efficient approach to granularity control. I'll post these parts one after the other, as I write them. • [Part 1 in PDF] • [Other formats] Other parts will be published in the coming weeks or months. [Part 1 in PDF] http://www.chargueraud.org/research/2020/multicore/src/part1.pdf [Other formats] http://www.chargueraud.org/research/2020/multicore/index.php Release 1.2 of HoCL ═══════════════════ Archive: [https://discuss.ocaml.org/t/ann-release-1-2-of-hocl/5837/1] jserot announced ──────────────── This is to announce release 1.2 of [HoCL], a functional language for describing dataflow process networks. *HoCL* • can describe *hierarchical* and/or *parameterized* graphs • support two styles of description : *structural* and *functional* • use *polymorphic type inference* to check graphs • supports the notion of *higher order wiring functions* for describing and encapsulating *graph patterns* • supports several dataflow semantics (SDF, PSDF, ..) by means of annotations. *HoCL* is entirely written in *OCaml*. Documentation (including a [tutorial], the underlying formal [semantics] and a general introduction on the [principles] of functional graph description) can be found [here]. [HoCL] https://github.com/jserot/hocl [tutorial] https://github.com/jserot/hocl/tree/master/doc/tutorial.pdf [semantics] https://github.com/jserot/hocl/tree/master/doc/semantics.pdf [principles] https://github.com/jserot/hocl/tree/master/doc/fgd.pdf [here] https://github.com/jserot/hocl/tree/master/doc Other OCaml News ════════════════ From the ocamlcore planet blog ────────────────────────────── Here are links from many OCaml blogs aggregated at [OCaml Planet]. • [Every proof assistant: Beluga] • [TLS 1.3 support for MirageOS] • [A Solidity parser in OCaml with Menhir] [OCaml Planet] http://ocaml.org/community/planet/ [Every proof assistant: Beluga] http://math.andrej.com/2020/05/25/mechanizing-meta-theory-in-beluga/ [TLS 1.3 support for MirageOS] https://mirage.io/blog/tls-1-3-mirageos [A Solidity parser in OCaml with Menhir] http://www.ocamlpro.com/2020/05/19/ocaml-solidity-parser-with-menhir/ Old CWN ═══════ If you happen to miss a CWN, you can [send me a message] and I'll mail it to you, or go take a look at [the archive] or the [RSS feed of the archives]. If you also wish to receive it every week by mail, you may subscribe [online]. [Alan Schmitt] [send me a message] mailto:alan.schm...@polytechnique.org [the archive] http://alan.petitepomme.net/cwn/ [RSS feed of the archives] http://alan.petitepomme.net/cwn/cwn.rss [online] http://lists.idyll.org/listinfo/caml-news-weekly/ [Alan Schmitt] http://alan.petitepomme.net/
_______________________________________________ caml-news-weekly mailing list caml-news-weekly@lists.idyll.org http://lists.idyll.org/listinfo/caml-news-weekly