Re: Walking trees without recursive iterators

sschwarzer Fri, 29 May 2020 04:15:49 -0700

@cblake Thanks for your encouragement, it _did_ motivate me. :-) Unfortunately, 
I only found the time this week.

So finally, here's a new take on the recursive tree iterator:
[https://hg.sr.ht/~sschwarzer/lazytree](https://hg.sr.ht/~sschwarzer/lazytree)
(clone with

hg clone https://hg.sr.ht/~sschwarzer/lazytree

Run

if you want to play with the code.)

The iterator code is in
[https://hg.sr.ht/~sschwarzer/lazytree/browse/src/lazytree.nim](https://hg.sr.ht/~sschwarzer/lazytree/browse/src/lazytree.nim)
and tests in
[https://hg.sr.ht/~sschwarzer/lazytree/browse/tests/test_lazytree.nim](https://hg.sr.ht/~sschwarzer/lazytree/browse/tests/test_lazytree.nim)
.

Using the iterator:

rootNode = parseXML("<root>...</root>")

for nodeInfo in lazyTree(rootNode):
# Deleting modifies the tree under `rootNode and also has the effect
# that the deleted node's child nodes won't be yielded. Of course
# deleting nodes is optional. In most cases you want to keep the node.
if nodeInfo.node.tag == "toBeDeleted":
nodeInfo.parentNode.delete(nodeInfo.childIndex)
# Do something else with `nodeInfo.node`, which is the actual node.
# (`parentNode` and `childIndex` are mostly needed to be able to delete
# a node from the tree.)
...

Run

Some remarks/thoughts:

* Contrary to my old implementation, this iterator goes forward. I had only
used the odd backward iteration because it presumably made the implementation
easier.
* The node type is generic. For now, it must satisfy the API of `XMLNode`
though.
* I had planned to go with the "typical" iterator design with `hasNextNode`
and `nextNode` (see
[https://en.wikipedia.org/wiki/Iterator_pattern#Structure](https://en.wikipedia.org/wiki/Iterator_pattern#Structure)
), but I had trouble "coordinating" these similar procs "in my head." I found
it far more natural to have a proc that returns an `option.some` for a new item
and `option.none` when the iterator is exhausted.
* The `while` loop in `func nextNodeInfoOption` used to be a nested `if`, as
can be seen at
[https://hg.sr.ht/~sschwarzer/lazytree/browse/src/lazytree.nim?rev=78731f21f65a4f2d889886ff06ce8cbbfb660aa0#L64](https://hg.sr.ht/~sschwarzer/lazytree/browse/src/lazytree.nim?rev=78731f21f65a4f2d889886ff06ce8cbbfb660aa0#L64)
(here without the node deletion feature). I switched to the `while` loop to
reduce the redundancy of the outer `elif` and the inner `if`. However, the
`while` loop doesn't fit the thought process behind the algorithm as nicely
because it obscures that actually at most two iterations should be required.
* The iterator detects the deletion of the node by comparing the number of
children of the parent of the deleted node to the previous call of
`nextNodeInfoOption`. This check is rather "weak" because it won't detect - and
handle - a lot of other changes that client code may do. So doing any other
tree modification beside deleting the currently yielded node may have undefined
results.
* I'm not sure whether the helper type `NodeInfo` should be modeled with a
variant type (as now) or with `Option` s for `parentNode` and `childIndex`
(which don't apply for the root node of a tree).
* One could provide a `proc deleteNode(nodeInfo: NodeInfo)` to make the
deletion easier. (Probably I'll make the change.) I think it's still useful to
expose at least `parentNode`, so you can delete a node based on the `tag` value
of the parent, for example.
* I've been thinking about a variant of the current implementation where I'd
create a `seq` of iterators for each child node when I start iterating over a
parent node. This approach should make it easier to control the iteration:
normally the code would iterate over the child iterators in the `seq`. Deletion
of a node by client code would make the iterator modify the `seq` and continue
processing it as before. A downside I see with this approach is that it would
create `seq` s of iterators for all currently processed nodes in advance. Then
again, this should only matter if nodes have big numbers of immediate child
nodes.
* Working on the code with this new approach was _much_ easier than the
previous implementation with the stack (or even several). This time, I never
even thought of using a debugger. :-) Only in one instance I added a
`debugEcho`, but the presumed bug in the iterator turned out to be in the test.
:-D

Do you see any edge cases where the algorithm would fail? If you think that a
code comment doesn't fit the code, this could be an indication. ;-)

Other feedback is welcome, too!

Re: Walking trees without recursive iterators

Reply via email to