And I have some uneducated guess also in https://mail.haskell.org/pipermail/haskell-cafe/2020-July/132559.html <https://mail.haskell.org/pipermail/haskell-cafe/2020-July/132559.html>
> And per my understanding, GHC's GC doesn't seek free segments within a heap, > it instead will copy all live objects to a new heap after then swap the new > heap to be the live one, so I assume memory (address space) fragmentation > doesn't make much trouble for a GHC process, as for other runtimes. > I suspect the difficulty resides in the detection of circular/cyclic > circumstances wrt live data structures within the old heap, especially the > circles form with arbitrary number of pointers of indirection. If the GC has > to perform some dict lookup to decide if an object has been copied to new > heap, that's O(n*log(n)) complexity in best case, where n is number of live > objects in the heap. > To efficiently copy circular structures, one optimization I can imagine is to > have a `new ptr` field in every heap object, then in copying another object > with a pointer to one object, the `new ptr` can be read out and if not nil, > assign the pointer field on another object' in the new heap to that value and > it's done; or copy one object' to the new heap, and update the field on one > object in the old heap pointing to the new heap. But I don't know details of > GHC GC and can't imagine even feasibility of this technique. And even the new > nonmoving GC may have similar difficulty to jump out of a circle when > following pointers. FYI > On 2020-07-30, at 21:53, YueCompl via ghc-devs <ghc-devs@haskell.org> wrote: > > Dear GHC Devs, > > I realize I should seek your helps regarding my current situation. > > TL;DR the most serious suffering is: After the heap loaded with many TVars > with cyclic structures, GC will dominate my CPU utilization with little > business progressing. > > Nonmoving GC `-xn` with 8.10.1 helps, but the ceiling is not high enough for > my case, obvious performance degrade starts at about ~350MB RSS, and falls > unusable as RSS approaching 1GB. While I expect a GHC compiled process to > serve 20~30GB in-mem data practically okay. > > https://mail.haskell.org/pipermail/haskell-cafe/2020-July/132556.html > <https://mail.haskell.org/pipermail/haskell-cafe/2020-July/132556.html> > contains most interesting conversations. > > I found > https://tech.channable.com/posts/2020-04-07-lessons-in-managing-haskell-memory.html > > <https://tech.channable.com/posts/2020-04-07-lessons-in-managing-haskell-memory.html> > much relevant, they managed to keep 100GB per instance, but switching to > immutable data structures to reside within compact regions is not immediately > feasible for my case, as the schema has to be re-designed, though I have my > colleges started evaluating that option. > > Best regards, > Compl > > >> Begin forwarded message: >> >> From: YueCompl via Haskell-Cafe <haskell-c...@haskell.org >> <mailto:haskell-c...@haskell.org>> >> Subject: [Haskell-cafe] For STM to practically serve large in-mem datasets >> with cyclic structures WAS: STM friendly TreeMap (or similar with range scan >> api) ? WAS: Best ways to achieve throughput, for large M:N ratio of STM >> threads, with hot TVar updates? >> Date: 2020-07-30 at 21:28:31 GMT+8 >> To: Haskell Cafe <haskell-c...@haskell.org <mailto:haskell-c...@haskell.org>> >> Reply-To: YueCompl <compl....@icloud.com <mailto:compl....@icloud.com>> >> >> For the record, overhead of STM over IO (or other means where manual >> composition of transactions needed) based concurrency control, is a price >> I'm willing to pay in my use case, as it's not machine-performance critical >> in distributing input data + parameters to a cluster of worker nodes, and >> collecting their results into permanent storage or a data pipeline. But to >> keep professionally crafting well synced, race-free scheduling code is >> barely affordable by my org, as shape of datasets, relationship between them >> and algorithms processing them are varying at fast paces, we have >> difficulty, or lack the willingness, to hire some workforce specifically to >> keep each new data pipeline race free, it has to be, but better at cost of >> machine-hours instead of human head counts. >> >> While easily compositing stm code, wrapped in scriptable procedures, will >> enable our analysts to author the scheduling scripts without too much >> concerns. Then our programmers can focus on performance critical parts of >> the data processing code, like optimization of tight-loops. >> >> Only if not in the tight loops, I think it's acceptable by us, that up to >> 2~3 order of magnitude slower for an stm solution compared to its best >> rivals, as long as it's scalable. For a (maybe cheating) example, if fully >> optimized code can return result in 10 ms after an analyst clicked a button, >> we don't bother if unoptimized stm script needs 10 second, so long as the >> result is correct. >> >> In a philosophic thinking, I heard that AT&T had UNIX specifically designed >> for their Control panel, while their Data panel runs separate software (and >> on separate hardware obviously), while modern systems have powerful CPUs >> tempting us to squeeze more performance out of it, and SIMD instructions >> make it even more tempting, I think we'd better resist it when programming >> something belong to the Control panel per se, but do it in programming >> something belong to the Data panel. And appears Data panel programs are >> being shifted to GPUs nowadays, which feels right. >> >> Regards, >> Compl >> >> >>> On 2020-07-30, at 20:10, YueCompl via Haskell-Cafe >>> <haskell-c...@haskell.org <mailto:haskell-c...@haskell.org>> wrote: >>> >>> Hi Peter, >>> >>> Great to hear from you! >>> >>> For the record tskiplist (and stm-containers together) did improve my >>> situation a great lot with respect to scalability at >>> concurrency/parallelism! >>> >>> I'm still far from the stage to squeeze last drops of performance, >>> currently I'm just making sure performance wise concerns be reasonable >>> during my PoC in correctness and ergonomics of my HPC architecture (an >>> in-memory graph + out-of-core (mmap) array DBMS powered computation >>> cluster, with shared storage), and after parallelism appears acceptable, I >>> seemingly suffer from serious GC issue at up scaling on process working >>> memory size. I'm suspecting it's because of the added more TVars and/or >>> aggressive circular structures of them in my case, and can not find a way >>> to overcome it by far. >>> >>> Thanks for your detailed information! >>> >>> Best regards, >>> Compl >>> >>> >>>> On 2020-07-30, at 19:19, Peter Robinson <p...@lowerbound.io >>>> <mailto:p...@lowerbound.io>> wrote: >>>> >>>> Hi Compl, >>>> >+ This package provides a proof-of-concept implementation of a skip list >>>> >in STM >>>> >>>> This has to mean something but I can't figure out yet. >>>> >>>> Dear Peter Robinson, I hope you can see this message and get in the loop >>>> of discussion. >>>> >>>> >>>> The reason for adding this sentence was that tskiplist hasn't been >>>> optimized for production use. Later on, I wrote an implementation of a >>>> concurrent skip list with atomic operations that performs significantly >>>> better, but it's operations work in the IO monad. >>>> >>>> I'm surprised to hear that you're getting poor performance even when using >>>> the stm-container package, which I believe was meant to be used in >>>> production. A while ago, I ran some benchmarks comparing concurrent >>>> dictionary data structures (such as stm-container) under various >>>> workloads. While STMContainers.Map wasn't as fast as the >>>> concurrent-hashtable package, the results indicate that the performance >>>> doesn't degrade too much under larger workloads. >>>> >>>> You can find these benchmark results here (10^6 randomly generated >>>> insertion/deletion/lookup requests distributed among 32 threads): >>>> https://lowerbound.io/blog/bench2-32.html >>>> <https://lowerbound.io/blog/bench2-32.html> >>>> And some explanations about the benchmarks are here: >>>> https://lowerbound.io/blog/2019-10-24_concurrent_hash_table_performance.html >>>> >>>> <https://lowerbound.io/blog/2019-10-24_concurrent_hash_table_performance.html> >>>> >>>> One issue that I came across when implementing the tskiplist package was >>>> this: If a thread wants to insert some item into the skip list, it needs >>>> to search for the entry point by performing readTVar operations starting >>>> at the list head. So, on average, a thread will read O(log n) TVars >>>> (assuming a skip list of n items) and, if any of these O(log n) TVars are >>>> modified by a simultaneously running thread, the STM runtime will observe >>>> a (false) conflict and rerun the transaction. It's not clear to me how to >>>> resolve this issue without access to something like unreadTVar (see [1]). >>>> >>>> Best, >>>> Peter >>>> >>>> [1] UnreadTVar: Extending Haskell Software Transactional Memory for >>>> Performance (2007) by Nehir Sonmez , Cristian Perfumo , Srdjan Stipic , >>>> Adrian Cristal , Osman S. Unsal , Mateo Valero. >>>> >>>> _______________________________________________ >>>> Haskell-Cafe mailing list >>>> To (un)subscribe, modify options or view archives go to: >>>> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe >>>> <http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe> >>>> Only members subscribed via the mailman list are allowed to post. >>> >>> _______________________________________________ >>> Haskell-Cafe mailing list >>> To (un)subscribe, modify options or view archives go to: >>> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe >>> <http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe> >>> Only members subscribed via the mailman list are allowed to post. >> >> _______________________________________________ >> Haskell-Cafe mailing list >> To (un)subscribe, modify options or view archives go to: >> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe >> <http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe> >> Only members subscribed via the mailman list are allowed to post. > > _______________________________________________ > ghc-devs mailing list > ghc-devs@haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs