More on Rust language

bearophile Thu, 03 Nov 2011 20:16:15 -0700

Through Reddit I've found two introductions to the system language Rust being 
developed by Mozilla. This is one of them:


http://marijnhaverbeke.nl/rust_tutorial/

This is an alpha-state tutorial, so some parts are unfinished and some parts 
will probably change, in the language too.

Unfortunately this first tutorial doesn't discuss typestates and syntax macros 
(yet), two of the most significant features of Rust. The second tutorial 
discussed a bit typestates too.

Currently the Rust compiler is written in Rust and it's based on the LLVM 
back-end. This allows it to eat its own dog food (there are few descriptions of 
typestate usage in the compiler itself) and the backend is efficient enough. 
Compared to DMD the Rust compiler is in a earlier stage of development, it 
works and it's able to compile itself but I think it's not usable yet for 
practical purposes.

On the GitHub page the Rust project has 547 "Watch" and 52 "Fork", while DMD 
has 159 and 49 of them, despite Rust is a quite younger compiler/software 
compared to D/DMD. So it seems enough people are interested in Rust.

Most of the text below is quotations from the tutorials.

---------------------------

http://marijnhaverbeke.nl/rust_tutorial/control.html

Pattern matching

Rust's alt construct is a generalized, cleaned-up version of C's switch 
construct. You provide it with a value and a number of arms, each labelled with 
a pattern, and it will execute the arm that matches the value.

alt my_number {
  0       { std::io::println("zero"); }
  1 | 2   { std::io::println("one or two"); }
  3 to 10 { std::io::println("three to ten"); }
  _       { std::io::println("something else"); }
}

There is no 'falling through' between arms, as in Conly one arm is executed, 
and it doesn't have to explicitly break out of the construct when it is 
finished.

The part to the left of each arm is called the pattern. Literals are valid 
patterns, and will match only their own value. The pipe operator (|) can be 
used to assign multiple patterns to a single arm. Ranges of numeric literal 
patterns can be expressed with to. The underscore (_) is a wildcard pattern 
that matches everything.

If the arm with the wildcard pattern was left off in the above example, running 
it on a number greater than ten (or negative) would cause a run-time failure. 
When no arm matches, alt constructs do not silently fall throughthey blow up 
instead.

A powerful application of pattern matching is destructuring, where you use the 
matching to get at the contents of data types. Remember that (float, float) is 
a tuple of two floats:

fn angle(vec: (float, float)) -> float {
    alt vec {
      (0f, y) when y < 0f { 1.5 * std::math::pi }
      (0f, y) { 0.5 * std::math::pi }
      (x, y) { std::math::atan(y / x) }
    }
}

A variable name in a pattern matches everything, and binds that name to the 
value of the matched thing inside of the arm block. Thus, (0f, y) matches any 
tuple whose first element is zero, and binds y to the second element. (x, y) 
matches any tuple, and binds both elements to a variable.

Any alt arm can have a guard clause (written when EXPR), which is an expression 
of type bool that determines, after the pattern is found to match, whether the 
arm is taken or not. The variables bound by the pattern are available in this 
guard expression.


Record patterns

Records can be destructured on in alt patterns. The basic syntax is {fieldname: 
pattern, ...}, but the pattern for a field can be omitted as a shorthand for 
simply binding the variable with the same name as the field.

alt mypoint {
    {x: 0f, y: y_name} { /* Provide sub-patterns for fields */ }
    {x, y}             { /* Simply bind the fields */ }
}

The field names of a record do not have to appear in a pattern in the same 
order they appear in the type. When you are not interested in all the fields of 
a record, a record pattern may end with , _ (as in {field1, _}) to indicate 
that you're ignoring all other fields.


Tags

Tags [FIXME terminology] are datatypes that have several different 
representations. For example, the type shown earlier:

tag shape {
    circle(point, float);
    rectangle(point, point);
}

A value of this type is either a circle¸ in which case it contains a point 
record and a float, or a rectangle, in which case it contains two point 
records. The run-time representation of such a value includes an identifier of 
the actual form that it holds, much like the 'tagged union' pattern in C, but 
with better ergonomics.


Tag patterns

For tag types with multiple variants, destructuring is the only way to get at 
their contents. All variant constructors can be used as patterns, as in this 
definition of area:

fn area(sh: shape) -> float {
    alt sh {
        circle(_, size) { std::math::pi * size * size }
        rectangle({x, y}, {x: x2, y: y2}) { (x2 - x) * (y2 - y) }
    }
}

------------------------------

// The type of this vector will be inferred based on its use.
let x = [];

// Explicitly say this is a vector of integers.
let y: [int] = [];

---------------------------

Tuples

Tuples in Rust behave exactly like records, except that their fields do not 
have names (and can thus not be accessed with dot notation). Tuples can have 
any arity except for 0 or 1 (though you may see nil, (), as the empty tuple if 
you like).

let mytup: (int, int, float) = (10, 20, 30.0);
alt mytup {
  (a, b, c) { log a + b + (c as int); }
}

---------------------------

Pointers

Rust supports several types of pointers. The simplest is the unsafe pointer, 
written *TYPE, which is a completely unchecked pointer type only used in unsafe 
code (and thus, in typical Rust code, very rarely). The safe pointer types are 
@TYPE for shared, reference-counted boxes, and ~TYPE, for uniquely-owned 
pointers.

All pointer types can be dereferenced with the * unary operator.

---------------------------

When inserting an implicit copy for something big, the compiler will warn, so 
that you know that the code is not as efficient as it looks.

---------------------------

Argument passing styles

...

Another style is by-move, which will cause the argument to become 
de-initialized on the caller side, and give ownership of it to the called 
function. This is written -.

Finally, the default passing styles (by-value for non-structural types, 
by-reference for structural ones) are written + for by-value and && for 
by(-immutable)-reference. It is sometimes necessary to override the defaults. 
We'll talk more about this when discussing generics.

==============================================

The second introduction I have found:
https://github.com/graydon/rust/wiki/

---------------------------

https://github.com/graydon/rust/wiki/Unit-testing

Rust has built in support for simple unit testing. Functions can be marked as 
unit tests using the 'test' attribute.

#[test]
fn return_none_if_empty() {
   ... test code ...
}

A test function's signature must have no arguments and no return value. To run 
the tests in a crate, it must be compiled with the '--test' flag: rustc 
myprogram.rs --test -o myprogram-tests. Running the resulting executable will 
run all the tests in the crate. A test is considered successful if its function 
returns; if the task running the test fails, through a call to fail, a failed 
check or assert, or some other means, then the test fails.

When compiling a crate with the '--test' flag '--cfg test' is also implied, so 
that tests can be conditionally compiled.

#[cfg(test)]
mod tests {
  #[test]
  fn return_none_if_empty() {
    ... test code ...
  }
}

Note that attaching the 'test' attribute to a function does not imply the 
'cfg(test)' attribute. Test items must still be explicitly marked for 
conditional compilation (though this could change in the future).

Tests that should not be run can be annotated with the 'ignore' attribute. The 
existence of these tests will be noted in the test runner output, but the test 
will not be run.

A test runner built with the '--test' flag supports a limited set of arguments 
to control which tests are run: the first free argument passed to a test runner 
specifies a filter used to narrow down the set of tests being run; the 
'--ignored' flag tells the test runner to run only tests with the 'ignore' 
attribute.
Parallelism


Parallelism

By default, tests are run in parallel, which can make interpreting failure 
output difficult. In these cases you can set the RUST_THREADS environment 
variable to 1 to make the tests run sequentially.

Examples
Typical test run

> mytests

running 30 tests
running driver::tests::mytest1 ... ok
running driver::tests::mytest2 ... ignored
... snip ...
running driver::tests::mytest30 ... ok

result: ok. 28 passed; 0 failed; 2 ignored

Test run with failures

> mytests

running 30 tests
running driver::tests::mytest1 ... ok
running driver::tests::mytest2 ... ignored
... snip ...
running driver::tests::mytest30 ... FAILED

result: FAILED. 27 passed; 1 failed; 2 ignored

Running ignored tests

> mytests --ignored

running 2 tests
running driver::tests::mytest2 ... failed
running driver::tests::mytest10 ... ok

result: FAILED. 1 passed; 1 failed; 0 ignored

Running a subset of tests

> mytests mytest1

running 11 tests
running driver::tests::mytest1 ... ok
running driver::tests::mytest10 ... ignored
... snip ...
running driver::tests::mytest19 ... ok

result: ok. 11 passed; 0 failed; 1 ignored

---------------------------

https://github.com/graydon/rust/wiki/Error-reporting

Incorrect use of numeric literals.

auto i = 0u;
i += 3; // suggest "3u"

Use of for where for each was meant.

for (v in foo.iter()) // suggest "for each"


This is something I'd like in D too:
http://d.puremagic.com/issues/show_bug.cgi?id=6638

---------------------------

https://github.com/graydon/rust/wiki/Attribute-notes

Crate Linkage Attributes

A crate's version is determined by the link attribute, which is a list meta 
item containing metadata about the crate. This metadata can, in turn, be used 
in providing partial matching parameters to syntax extension loading and crate 
importing directives, denoted by the syntax and use keywords respectively.

All meta items within a link attribute contribute to the versioning of a crate, 
and two meta items, name and vers, have special meaning and must be present in 
all crates compiled as shared libraries.

An example of a typical crate link attribute:

#[link(name = "std",
       vers = "0.1",
       uuid = "122bed0b-c19b-4b82-b0b7-7ae8aead7297",
       url = "http://rust-lang.org/src/std";)];

==============================================

Regarding different kinds of pointers in D, I have recently found this:
http://herbsutter.com/2011/10/25/garbage-collection-synopsis-and-c/

>From what I understand in this comment by Herb Sutter, I was right when about 
>three years ago I was asking for a second pointer type in D:

>Mark-compact (aka moving) collectors, where live objects are moved together to 
>make allocated memory more compact. Note that doing this involves updating 
>pointers values on the fly. This category includes semispace collectors as 
>well as the more efficient modern ones like the .NET CLRs that dont use up 
>half your memory or address space. C++ cannot support this without at least a 
>new pointer type, because C/C++ pointer values are required to be stable (not 
>change their values), so that you can cast them to an int and back, or write 
>them to a file and back; this is why we created the ^ pointer type for C++/CLI 
>which can safely point into #3-style compacting GC heaps. See section 3.3 of 
>my paper (http://www.gotw.ca/publications/C++CLIRationale.pdf ) A Design 
>Rationale for C++/CLI for more rationale about ^ and gcnew.<

Tell me if I am wrong still. How do you implement a moving GC in D if D has raw 
pointers? D semantics doesn't allow the GC to automatically modify those 
pointers when the GC moves the data.

--------------------------

As you see this post of mine doesn't discuss typestates nor syntax macros. I 
have not found enough info about them in the Rust docs.

Even if Rust will not become widespread, it will introduce typestates in the 
cauldron of features known by future language designers (and maybe future 
programmers too), or it will show why typestates are not a good idea. In all 
three cases Rust will be useful.


Some comments regarding D:
- I'd like the better error messages I have discussed in bug 6638.
- Tuple de-structuring syntax will be good to have in D too. There is a patch 
on this. If the ideas of the patch are not developed enough, then I suggest to 
present the design problems and to discuss and solve them.
- I'd like a bit more flexible switch in D, discussion: 
http://d.puremagic.com/issues/show_bug.cgi?id=596
  This is just an additive change, I think it causes no breaking changes.
- Tag patterns used inside the switch-like "alt": syntax-wise this looks less 
easy to implement in D.
- I think unit testing in D needs more improvements. Rust is in a less 
developed state compared to D, yet its unit testing features seems better 
designed already. I think this is not complex stuff to design and implement.

Bye,
bearophile

More on Rust language

Reply via email to