Five days ago, on April 11, I started work on *leoTokens.rs*, a prototype 
transliteration of *leoTokens.py*, Leo's token-based beautifier. This work 
was my first significant Rust project.


This Engineering notebook post discusses my experiences. I'll also discuss 
an idea for improving leoTokens.py.


*RustPython-Parser*


The prototype uses lexer.rs 
<https://github.com/RustPython/Parser/blob/main/parser/src/lexer.rs>, a 
*Python *tokenizer written in Rust. This file is part of the 
RustPython-Parser <https://github.com/RustPython/Parser> project. There are 
a few problems with lexer.rs, but they did not interfere with the prototype.


*Performance*


Last night, this prototype reached a milestone by realistically modeling 
the expected performance:


 file name: c:/Repos/leo-editor/leo/core/leoTokens.py

      read: 1.08ms

  tokenize: 12.23ms

    tokens: 10159


Leo's beautifier takes roughly 100ms to do the same, so the Rust prototype 
is about 8x faster. A production version might only be 5x faster.


*Learning Rust*


The good news and bad news about Rust are the same: Rust is a very picky 
language :-) Rust programs must specify much more than Python requires. 
Otoh, the Rust compiler usually offers superb hints for correcting errors.


I enjoyed being a newbie Rustacean. There were so many newbie-level puzzles 
to solve. Otoh, I nearly became crazed by the effort!


Last night, I *finally* realized that *aList.clone()* keeps the 
borrow-checker happy when iterating over a list. For example:


 for input_token in &self.input_list.clone()  {
   self.make_output_token(input_token);
 }


There were many other Ahas, but this one was a milestone.


*Improving Leo's beautifier*


Yesterday's work created a list of *output tokens* from the corresponding 
list of *input tokens*. Leo's beautifier does the same. But Aha! A simpler 
architecture *might* work:


- Don't generate whitespace input tokens!

  Skipping whitespace would simplify the token-based parse.

- *Lazily *generate the whitespace between tokens.

  The output list could be a list of simple strings.


*Summary*


Learning Rust has been an all-consuming experience.


The prototype is now in the "devel" branch. Look for the node "prototype: 
leoTokens.rs" in LeoPyRef.leo. It's in the attic.


The Rust code is surprisingly simple. It needs neither lifetime annotations 
nor generic types. 


The final Rust beautifier might be 5x to 8x faster than Leo's beautifier. 
I'm not sure the speedup is worth the maintenance burden on Leo's (future) 
devs. In any case, the work has been worthwhile.


The perspective gained suggests a significant improvement to Leo's Python 
beautifier. I'll be exploring that possibility next before continuing work 
on leoTokens.py. Stay tuned.


Edward

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to leo-editor+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/leo-editor/37f1ffa4-1046-4e61-b875-10346d68834fn%40googlegroups.com.

Reply via email to