Re: Internal structure (was Re: File format debate

Asger Alstrup Nielsen Sat, 24 Oct 1998 06:52:00 -0400
> Moving the discussion from the file format (aaarrgh!) to the internal
> structure I would say that, if I understand Asger idea about the tree
> (if not please explain), a sectioning object is a node of the tree, that

> has as children the sectioning paragraph (chapter or section name)  and 
> the contents. The contents is not another paragraph but another tree
> (branch) of paragraphs. 
> 
> Is this what you have in mind?

Yes, almost exactly this.

It's similar to DocBook:

<chapter>
        <title>This is the chapter title</title>
        <para>Here's a paragraph in the chapter</para>
        <sect1>
                <title>A section in this chapter</title>
                <para>This is a paragraph in the section</para>
        </sect1>
</chapter>

So, the chapter node has a list of children nodes.  In this particular
example, it has three children:  A title inset, a paragraph inset, and a
section inset.

In turn, the section inset has two children of it's own:  A title inset
and a paragraph inset.

So, the tree looks like this:

        chapter
        |    
        Title--Paragraph--Section
                                |
                                Title--Paragraph

where we link the siblings.

In practice, the general inset node could have a structure like this:

class Inset {
        // [methods that are general and common for all]

        /// Pointer to the parent
        Inset * parent;
        /// Pointer to left sibling, i.e. "brother"
        Inset * leftsibling;
        /// Pointer to right sibling
        Inset * rightsibling;
        /// Pointer to first child
        Inset * firstchild;
}

With this structure, we can easily implement a method to simulate an
"ordinary" tree data structure with only the parent-child relationsship
defined, and we can do this in linear time:

        /// Which children does this node have?
        vector<Inset *> Inset::children() {
                vector<Inset *> i;
                Inset * t = firstchild;
                while (t != 0) {
                        i.push_back(t);
                        t = t->rightsibling;
                }
                return i;
        }

Similar methods can be implemented to expose the insets in any other way
we want, and in most cases, we'll get linear time access.
And if this is not fast enough, we can cache these things to get amortized
constant time.  No problem.

The main point is this:  We don't need to bother to discuss much how the
internal structure should be in detail:  All questions like "Should the
sections be nested in chapters or not?" and so forth are not important: 
With the proper methods, we can simulate the data structure to fit the
situation, and thus effectively expose a data structure where the sections
are nested in chapters, and anyother interface where sections and chapters
are siblings.

This applies no matter which detailed representation we choose:  We will
always be able to simulate the other representations (as long as the data
structure is general enough, and any n-branched tree, as the above
representation, is.)

Therefor the discussion reduces into a discussion of which representation
is the most efficient.  And such a discussion is not interesting on this
list, because it can only be answered by implementing this stuff:  We
don't know what the bottleneck is in advance, and we don't know how much
we can gain by tuning any given data structure.

--

The link to the file format discussion is similar:  No matter which tree
representation we chose, we can expose it in a way that fits the file
format requirements.
So the question "Should the file format correspond to the internal data
structure or not?" doesn't make much sense beyond the answer: "Yes, as far
as it is a reflection of a tree structure".  Discussion it further than
this will not bring important insights, IMO.

Greets,

Asger
Re: Internal structure (was Re: File format debate

Reply via email to