indygreg added a comment.

  First of all - wow! Thanks for writing all this code. There's definitely a 
lot to work with. And work with it we will!
  
  This is definitely too big to review as one commit. If you could do *any* 
work to split it up, it would be greatly appreciated. I'd focus on the pure 
Rust pieces first. Everything needed to open revlogs would be great!
  
  You may find our custom Phabricator extensions (linked from 
https://www.mercurial-scm.org/wiki/Phabricator) useful for submitting series of 
commits to Phabricator.
  
  Regarding the performance, that's pretty good! The dirstate code is some of 
the most optimized code in Mercurial. There are some gnarly Python C hacks to 
make it fast. Some of those tricks involve using special system calls to walk 
directories to minimize the number of system calls. I'm not sure if the crate 
you imported has those optimizations. (I wouldn't be surprised either way.) I 
wouldn't worry too much about performance at this juncture. But I suspect we 
could make the Rust code another 50% faster with some tweaking. It would also 
be interesting to test on a larger repo, say 
https://hg.mozilla.org/mozilla-unified. Also, I believe there are hooks in the 
dirstate code to use Watchman (fsmonitor). Those hooks are critical in order to 
achieve peak performance on large repositories.
  
  Since you seem to be proficient at writing lots of Rust code, if you are 
looking for another project, may I suggest porting `chg` to Rust? That code is 
in `contrib/chg`. That might be the easiest component to actually ship in Rust 
since it is a standalone binary that doesn't link against Python. But we 
shouldn't get ahead of ourselves :)
  
  Anyway, it is late for me and I need to detach from my computer. I'm sure 
others will have things to say as well...

INLINE COMMENTS

> build.rs:1
> +// build.rs -- Configure build environment for `hgcli` Rust package.
> +//

I see this file was copied. There's nothing wrong with that. But does this mean 
we will need a custom build.rs for each Rust package doing Python? If that's 
the case, then I would prefer to isolate all our rust-cpython code to a single 
package, if possible. I'm guessing that could be challenging due to crossing 
create boundaries. I'm sure there are placed where we don't want to expose 
symbols outside the crate.

I'm curious how others feel about this.

> main.rs:233-261
> +    let matches = clap::App::new("hg rust oxidation")
> +        .arg(
> +            clap::Arg::with_name("repository")
> +                .short("c")
> +                .long("repository")
> +                .value_name("dash_r"),
> +        )

This is definitely nifty and an impressive achievement \o/

The `r-` commands for testing pure Rust code paths are an interesting idea!

I think I'm OK with including support for this in `hgcli`. But I think the code 
should live in a separate file so it doesn't pollute `main()`. And it should be 
behind a Cargo feature flag so we maintain compatibility with `hg` as much as 
possible by default.

Also, Mercurial's command line parser is extremely wonky and has some 
questionable behavior. If the intent is to make `rhg` compatible with `hg`, we 
would need to preserve this horrible behavior. We'll likely have to write a 
custom argument parser because of how quirky Mercurial's argument parser is :(

> config.rs:95
> +        } else {
> +            RevlogFormat::V0
> +        }

I would not worry about supporting v0 or v2 at this time. v0 is only important 
for backwards compatibility with ancient repos. And v2 never got off the ground.

> revlog_v1.rs:279
> +        let mut fhandle = BufReader::new(match self.dflag {
> +            Inline => fs::File::open(self.index_file.as_path()).unwrap(),
> +            Separated(ref dfile) => fs::File::open(dfile).unwrap(),

IIRC, core Mercurial keeps an open file handle on revlogs and ensures we don't 
run out of file handles by not keeping too many revlogs open at the same time. 
For scanning operations, not having to open and close the file handles all the 
time will make a difference for performance. Also, core Mercurial loads the 
entirety of the `.i` file into memory. That's a scaling problem for large 
revlogs. But it does make performance of index lookups really fast.

> revlog_v1.rs:290-293
> +        while let Some(ref chld_r) = it.next() {
> +            if let Some(bin) = self.get_content(&mut fhandle, chld_r) {
> +                bins.push(bin);
> +            } else {

A thread pool to help with zlib decompression should go a long way here.

Probably too early to think about this, but we'll likely eventually want a 
global thread pool for doing I/O and CPU expensive tasks, such as reading 
chunks from a revlog and decompressing them.

FWIW, we're going to radically alter the storage format in order to better 
support shallow clones. But that work hasn't started yet. I still think there 
is a benefit to implementing the revlog code in Rust though.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2057

To: Ivzhh, #hg-reviewers
Cc: krbullock, indygreg, durin42, kevincox, mercurial-devel
_______________________________________________
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

Reply via email to