Hi all,

I propose changing Tree in the server to no longer extend HashMap as it could 
have functions that don't align with Tree behaviour.

Problem
=======

Tree<T> extends HashMap<T, Tree<T>>, and as a result several Tree operations 
behave like Map operations rather than tree operations users expect:

  * tree.size() returns top-level entries, not total nodes
  * tree.containsKey(x) only checks immediate children
  * tree.isLeaf() throws NoSuchElementException on an empty tree
  * getObjectsAtDepth(0) returns an empty list rather than the roots


Proposed change
===============

Replace Tree<T> extends HashMap<T, Tree<T>> with a final class that contains a 
HashMap and exposes only tree-shaped methods:

    public final class Tree<T> implements Serializable {
        private final Map<T, Tree<T>> children = new HashMap<>();

        public Set<T>             rootNodes();
        public Tree<T>            childAt(T key);
        public boolean            isLeaf();
        public int                nodeCount();
        public Optional<Tree<T>>  findSubtree(T key);
        public String             prettyPrint();
        @Override public boolean  equals(Object other);
    }

Tree no longer is-a Map. It has-a Map. All public methods are tree-shaped, with 
names that mean what users expect:

  * nodeCount() for total count
  * hasChild / contains for top-level vs recursive search
  * childAt / findSubtree for direct vs recursive lookup
  * getObjectsAtDepth(d) is 0-based
  * equals/hashCode are structural recursive


Breaking changes
===============

Compile-time break for Java callers that treat Tree as a Map (assignment to 
Map<?,?>, instanceof Map/HashMap checks, calls to Map-only methods).

A few Gremlin pipeline patterns that work today as side effects of Tree being a 
Map like select(keys), count(local), unfold() no longer compose. Tree() becomes 
a terminal-style step, like subgraph() where users process the result 
client-side after .next(), or reshape the upstream traversal.

Performance
===========
Basically, unchanged. Memory per node (~48 B), child-lookup cost (O(1)), merge 
cost (O(distinct keys)), iteration order, andwire-format encoding are all the 
same as today. The change is purely in the public API surface.


Future extensions
=================

Additive, non-breaking follow-ups, landing in a separate PR as demand justifies:

  * findAllSubtrees(T), subtreeAtPath(List<T>), streamLevelOrder()
  * map, filter, prune
  * intersect / difference / merge
  * implements Iterable<T> (for-each over node values)

Thanks,
Guian

Reply via email to