Re: [ADVANCED-DOTNET] Opinion on a tree implementation

Sébastien Lorion Mon, 30 Jul 2007 14:57:20 -0700

On 7/30/07, Barry Kelly <[EMAIL PROTECTED]> wrote:
> Sébastien Lorion <[EMAIL PROTECTED]> wrote:
>
> > I will give a longer reply later, but two quick questions:
> >
> > 1) How do you make your tree read-only ? Ex.: A method builds a tree and
> > returns it as read-only. There must be no way other than using reflection to
> > circumvent that.
>
> I'm not sure that's very important. I make a distinction between two
> kinds of collection classes: those designed to be exposed to the
> "outside" world and thus need to be robust in terms of maintaining
> semantic integrity, and those that are designed to be efficient and have
> lots of algorithms etc. directly available on them, and are used in the
> implementation of other classes.
>
> For example, the .NET BCL has both Collection<T> and List<T>, and on
> their face they seem quite similar. Why have both? Well, Collection<T>
> has a smaller public API, and has virtual methods that help maintain
> semantic integrity. On the other hand, List<T> has methods for sorting,
> no new virtual methods, and no help for maintaining semantic integrity.
> If you want to return a read-only list when you've started with a
> List<T>, you might consider creating a ReadOnlyCollection<T> and passing
> the List<T> to the constructor, thereby creating a read-only view by
> proxy.


I personally do not like this design at all and it has annoyed me on
many occasions, both on work and personal projects. I much prefer the
object model of PowerCollections : the collection classes are simple
and there is one or more utility classes to do operations (the
Algorithms class). These operations work on the *interfaces* and so
are not bound to one particular class : no need to make a choice. I
don't think this kind of API is more difficult to learn/discover than
the one proposed by .NET, but I may be wrong and not open minded
enough to put myself in the shoes of other programmers.

>
> So, if a read-only view is badly needed, then a read-only proxy would be
> one way to do it. It might be better to separate the payload of the tree
> explicitly from the (logical) tree node in that case, though, otherwise
> accessing the members of the logical tree node would require extra work
> and probably wouldn't be statically type safe. Since the jobs I put my
> trees to are very tree-oriented, I don't usually go down this route, and
> live with mutable classes.

Yes, it seems much more simple to centralize operations in the tree
class to get a read-only tree. As a bonus, you can also pass around a
TreeNode (ie a sub tree) which will always be read-only. You can see
this is exactly what I have done in my model. I prefer that to the
IsReadOnly property found on the IList interface thought there are
valid reasons behind that choice.

> BTW, because C# doesn't support tail recursion, the ideal technique I'd
> use - immutable classes - sometimes isn't possible. An alternative
> technique would be some kind of TreeBuilder<T> that uses a combination
> of delaying construction and private knowledge of the link fields to
> avoid excessive recursion. That would need some kind of loose binding
> for the constructors though, in order to keep the number of possible
> tree node types unbounded.
>

I am still exploring the immutable tree idea. The BigList collection
in PowerCollections can share state, so I am looking how it works and
see if I can do the same. Also looking the way of functional libraries
found in Haskell, OCaml/F# and the like.

> > 2) How do you handle enumerator versionning ?
>
> I don't sweat it. Enumerator versioning is (ISTM) mostly a way of
> guarding against concurrency problems and should only be visible in
> buggy programs, while I'm more or less philosophically opposed to
> sharing mutable data structures across threads outside of a handful of
> well-understood idioms, such as queues, dictionaries, etc. I personally
> wouldn't sacrifice much performance to maintain a versioning semantic.

I agree with you and I did not want to spend too much time on that
feature. Versionning is supported at the tree level and that's it. The
cost of that is minimal (increment an integer on each operation and
compare versions on MoveNext()). Supporting it at the node level is
out of the question at this moment and probably will never happen.

> I believe that when you think about it, trees are a lot less suited to
> general collection idioms than are array-based lists, because clients of
> tree structures typically choose a tree for its intrinsic properties and
> possibly while maintaining a bunch of other invariants, rather than
> simply as a bucket of values. So, one sees things like SortedList<,>
> which uses a tree internally, but doesn't expose it to the outside
> world, since not every person who needs a tree needs a sorted tree, much
> less one as specific as the one SortedList<,> uses.
>

I want to use the tree class as a base for what you describe, but I
would still expose it publicly to allow more advanced users to develop
their own data structures if the need arises. The NCollection project
aims to provide base building blocks as well as readily useful
collections. I started with the tree because I think this is where a
design will pass or fail. A list based structure is easier to design
and you can get away with some design errors that become apparent only
with a recursive structure.

> FWIW, I think you'd be better off developing a tree / graph library as
> you're finding uses for it, then re-adapting it for open source
> purposes, rather than doing it in the other way around. Once upon a time
> I fell into the trap of doing a library that way, and ended up
> over-engineering it, and regretted keeping it smaller.
>

I agree with you and it is what I am doing. I really try to keep the
design to a bare minimum as you can see when looking at the
interfaces. Now I am dealing with implementation decisions that have a
great impact on how the tree will behave in terms of performance and
how easy it will be to support more features in the future
(notifications, change tracking, state sharing, etc).

If I only wanted a quick solution, I would go with a design

Node<T>
{
  T Value

  Node<T> Parent
  List<Node<T>> Children
}

and I would be done, but honestly, I need more than that.

I was asking on this list if some people had prior experience and/or
gut feeling about the 2 options I proposed (where to store
relationships). I am a bit surprised by the lack of interest (maybe
this is not the best place to ask around) and the reactions saying
this already exists when in fact, it does not.

Sébastien

> -- Barry
>
> --
> http://barrkel.blogspot.com/
>
> ===================================
> This list is hosted by DevelopMentor(r)  http://www.develop.com
>
> View archives and manage your subscription(s) at http://discuss.develop.com
>


-- 
Sébastien
www.sebastienlorion.com

Re: [ADVANCED-DOTNET] Opinion on a tree implementation

Reply via email to