Recent improvements to @clean allow Leo to update outlines containing thousands of @clean nodes. For the first time, it is feasible to use Leo to work on huge repos such as Rust's compiler <https://github.com/rust-lang/rust/tree/master/compiler>.
Alas, Leo's performance degrades substantially when using huge outlines. Python's GC (Garbage Collector) probably gets overly stressed by all the temporary data Leo generates. This Engineering Notebook post explores a possible solution. As always, please feel free to ignore it. However, this ENB presents an exciting new direction for Leo. *@leo nodes would create a hierarchy of Leo outlines* The idea is to let *@leo nodes* in a* top-level outline* coordinate operations in *linked sub-outlines*. For example: *rust_compiler.leo* (in the rust/compiler directory) would have the following @leo nodes: @leo rustc/rustc.leo @leo rustc_abi/rustc_abi.leo @leo rustc_arena/rustc_arena.leo And dozens of others. So the top-level outline will be tiny and the *sub-outlines *will be much smaller. As discussed below, the performance might not improve enough. But let's discuss some exciting ideas first. *Cross-file searches and (maybe??) cross-file clones* Straightforward extensions to Leo's file commands will allow Leonistas to search all subsidiary outlines from the top-level outline! Cross-tab (or inter-process) communication will transfer results from the sub-outlines to the top-level outline. All details are unclear for now. The details of cross-file cff commands are more complex. Initially, the sub-outlines could communicate the *cross-file unls* back to the top-level outline. The cff becomes a set of unls. Recall that *Leo already supports cross-file unls.* Later, we might consider true cross-file clones. Changing such a clone in the top-level outline would change the corresponding clone in the sub-outline. And vice versa! But this is not the time to consider how to do this magic. For now, the conclusion is that cross-file clones *might* make sense, contrary to my decades-old opinion! *Helping the GC?* Now let's turn our attention back to performance issues. First, let's suppose Leo handles @leo nodes by loading sub-outlines in separate tabs. Does this help the GC? The answer is "yes and no" -) Yes, each tab contains less data, so operations on the tree and body *might* become more efficient. But no, the GC has the same amount (and a bit more) to handle. My *guess* is that putting smaller outlines into separate tabs will have a small (negligible?) effect on performance. *The first prototype* Happily, it will be easy to prototype this initial idea. I'll write a script that: - Creates @leo nodes for all sub-directories of the rust/compiler directory. - Creates the corresponding .leo file in each subdirectory. - Loads (details unclear) each created (subsidiary) .leo file with the desired @clean nodes. - Creates a list (suitable for the command-line) of files to be loaded. So a command line like: leo rust_compiler.leo <list of sub-outlines> will load all the desired outlines, placing each sub-outline in its own tab. It will then be easy to see how much this scheme improves Leo's performance. *Separate processes instead of separate tabs* Separate tabs might not help enough. In that case, the @leo could load sub-outlines in separate *processes* instead of separate tabs. This approach will almost surely solve the performance problems. Operating systems are very very good at running separate processes! Each process will run a separate copy of Python with its own GC. The same general ideas still apply, but now the top-level outline and all the sub-outlines must communicate via Leo's servers. There will probably be one server per process. Leo's server architecture will almost surely need to be extended. Surely such a scheme is feasible, but I have no intuition about the details. Happily, *we can ignore inter-process complications for now.* I'll do all my initial experiments using sub-outlines in separate Leo tabs. It should be straightforward to extend Leo's find command using inter-*tab* communication. Who knows, maybe cross-file clones *do *make sense! *Summary* *@leo nodes* will create a hierarchical relationship between a (single) *top-level outline* and several *sub-outlines*. For now, we can assume that @leo nodes appear only in the top-level outline. We'll reexamine this question later. Extensions to Leo's file commands (including the clone-find commands) will allow sub-outlines to send results back to the top-level outlines. Initially, results will be unls. Eventually, these unls might morph into cross-file clones. Changing a cross-file clone in the top-level outline would change the corresponding clone in the sub-outline and *vice versa.* Communication between tabs is straightforward, but putting sub-outlines in separate Leo tabs is unlikely to improve Leo's performance enough. Ultimately, Leo could run each sub-outline in a separate process. This scheme would require substantial updates to Leo's server. For now, I'll extend Leo's find commands using inter-*tab* communication. Maybe cross-file clones *do *make sense! I welcome all your comments, questions, and suggestions. I am excited by this project, and I hope you are too. Edward -- You received this message because you are subscribed to the Google Groups "leo-editor" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion visit https://groups.google.com/d/msgid/leo-editor/9e9f5e67-bd85-401e-84dc-4b508907bd5bn%40googlegroups.com.
