> How would you create a `JS<T>` on the stack?
I'll provide some context. I'm designing the HTML parser API to be compatible
with different DOM representations. This is desirable for a few reasons:
- I want the library to be useful outside of Servo. It will have the
ability to output a simple static parse tree, for users who don't have
their own DOM type.
- Decoupling the parser from the details of Servo's DOM will make both
systems much easier to modify.
- For off-thread parsing, we want to build a sequence of tree operations,
but for on-thread parsing we should manipulate the DOM directly.
My API mock-up at the moment looks like
pub trait TreeSink<Handle> {
fn create_element(&mut self, name: Atom) -> Handle;
fn detach_from_parent(&mut self, child: Handle);
fn append_element(&mut self, parent: Handle, child: Handle);
// ...
}
(In the real API there will be more parameters, e.g. a namespace and attributes
for create_element.)
The library client will provide an implementation of TreeSink with an
appropriate Handle type, and the parser will call these methods to manipulate
the DOM during parsing.
A Handle represents a reference to a mutable node in the DOM. They're required
to be Clone, because the parser will hold internal references to nodes, e.g. in
the stack of open elements.
Implementing this trait for a refcounted DOM is straightforward:
struct Node {
pub name: Atom,
pub parent: Option<WeakHandle>,
pub children: Vec<Handle>,
}
#[deriving(Clone)]
struct Handle {
ptr: Rc<RefCell<Node>>,
}
struct WeakHandle {
ptr: Weak<RefCell<Node>>,
}
struct Sink {
root: Option<Handle>,
}
impl TreeSink<Handle> for Sink { ... }
In Servo we have a JS-managed DOM, and on-main-thread parsing should manipulate
it directly. I have something like
#[deriving(Encodable)]
struct Node {
pub name: StrBuf,
pub parent: Option<Handle>,
pub children: Vec<Handle>,
}
type Handle = JS<Node>;
impl TreeSink<Handle> for Sink {
fn create_element(&mut self, name: Atom) -> Handle {
let owned = ~Node {
name: name,
children: vec!(),
parent: None,
};
// Not shown: also build a JS wrapper object.
unsafe {
JS::from_raw(cast::transmute::<~Node, *mut Node>(owned))
}
}
fn append_element(&mut self, parent_hdl: Handle, child_hdl: Handle) {
let mut parent = parent_hdl.root();
let mut child = child_hdl.root();
(*child).parent = Some(parent_hdl.clone());
parent.children.push(child_hdl.clone());
}
}
This (approximately) compiles, but I think it's not memory-safe, because we
pass and return un-rooted JS<T> values. To fix this we need two handle types:
pub trait TreeSink<InHandle, OutHandle> {
fn create_element(&mut self, name: Atom) -> OutHandle;
fn detach_from_parent(&mut self, child: InHandle);
fn append_element(&mut self, parent: InHandle, child: InHandle);
// ...
}
which will be instantiated as &JSRef<Node> and Temporary<Node> respectively.
(And the lifetime of the JSRef will be inferred as the lifetime of each call,
as in DOM bindings, but I'm not sure how to express this within the trait
impl.) We'll need another trait to let the generic parser code convert between
these types.
There is also the question of what handles to store within the parser.
- We could root every node as it's created, and unroot when the parser
is destroyed. We'd store JSRef<Node>, transmuting away the lifetimes.
- We could root the parser itself, make it traceable, and store JS<Node>.
This seems safer, but would complicate the generic interface further.
This is all a bit moot if a parser never lives across a JS operation that could
GC. But I wouldn't bet on that always being the case. The current Hubbub
bindings basically make this assumption, though; see
http://irclog.gr/#show/irc.mozilla.org/servo/103713
I also thought about something like
pub trait TreeSink<InHandle, OutHandle> {
fn create_element<'t>(&'t mut self, name: Atom) -> OutHandle<'t>;
fn detach_from_parent<'t>(&'t mut self, child: InHandle<'t>);
but this would require higher-kinded polymorphism. It will never really be
possible for safe code to use a handle which stores an &'t mut Node, because
that would completely break Rust's mutable-pointer aliasing rules.
For off-thread parsing, the TreeSink methods just record tree operations to be
executed later. In that situation, I think handles should be sequential
integer IDs. The tree op executor will use them as indexes into a vector of
the nodes that it has created. This is similar to Gecko's approach, where a
handle is an nsIContent**, i.e. a pointer to a slot where a node pointer will
eventually be stored.
By the way, it's not relevant to Servo, but I think we can parse into an owning
tree without refcounting or copying. During tree building we'll have
struct BuildNode {
pub name: Atom,
pub parent: Option<*mut BuildNode>,
pub children: Vec<*mut BuildNode>,
}
and every node will be owned by the TreeSink itself. When parsing is finished
we transmute the root to
struct Node {
pub name: Atom,
/*priv*/ unused_parent: Option<uint>,
pub children: Vec<~Node>,
}
transferring ownership of each node to its parent. Then we free any nodes that
didn't make it into the final tree.
Designing a library to be generic over its client's memory management approach
in a statically safe way seems to be quite the challenge. I'd be very happy to
hear thoughts regarding any of the above.
keegan
_______________________________________________
dev-servo mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-servo