Some thoughts on changing the underlying field code

Keith Stribley Sat, 12 Feb 2000 13:34:31 -0600 (CST)


I've been thinking about doing something on fields for awhile, so here are my ideas. 

It seems to have been agreed that a field container is going to be used which means 
changes to the importer which currently assumes that Objects don't have any content. 
What should fields be represented as at the PieceTable level? Should we just add 
additional code to an object, is a new type of Strux more appropriate or even a 
completly new piecetable type? I'm struggling to get a grasp of all the issues 
involved, so would welcome some others looking at this.

Also, as I see it a single field may well need to consist of several runs (or have I 
misunderstand the code). Take for example a field with several words of text such as 
the author of a document, we want to allow a line to wrap in the middle of such a 
field. Clearly, the contents of a field needs to be represented by runs which are 
distinguishable from editable text. As I see it at least 2 possibilities exist:

1. Continue having specific fp_FieldRuns but modify them such that any one field is 
represented by a linked list of such runs to represent its content. The line breaking 
code would need to be implemented specifically for these FieldRuns.

2. Add code to allow any of the existing runs to be a non-editable field run. All the 
functionality for the current types of run, including line breaking etc., could then 
be used by a field if desired.

As far as where to implement the field calculation code, I was thinking of a completly 
new class structure which would be linked in at the document and piecetable fragment 
level. This is outlined in more detail below, but to start with I tried to split 
fields into categories from a functionality stand point.

A. Fields independent of position within document 
   - although their value would probably be changing as the document was edited, the 
value displayed would be independent of where the field was inserted, but needs to be 
updated every time it changes.

B. Reference fields which are linked to a bookmark of some kind
   - These are also independent of their own position in the document, but potentially 
need to be updated every time their bookmark changes/moves.

C. Fields relating to position in document
   - This information is already available in current field implementation probably no 
need for any helper classes. Every time the field moves in the document they need to 
be updated.

D. Sequence fields eg for Figure numbering
   - These are dependent on their position relative to other sequence fields with the 
same sequence name. These will change every time a field of the same type above them 
in the document is added or removed.

E. Tables of contents etc
   - These change continuously with updating of the document and may need manual 
updating to avoid performance penalties.

F. Other types 
   - logical, database fields etc - I'm assuming these won't be implemented for a long 
time so haven't thought much about them.

It is clear that each of these types has different requirements for updating. I'm sure 
there any many possible implementations but here is what I have come up with which I 
hope might at least be a basis of discussion. The attached pdf file contains a UML 
Class diagram (done in Dia) which hopefully clarifies the discussion. (The attributes 
and operations in the classes are not supposed to be exhaustive, just to give an idea 
of functionality. I also suspect their names need changing a bit).

A. Document wide information/attributes could be contained in a list at a document 
level. Any field which used an attribute would link to that specific attribute. 
Whenever that attribute was changed it would call update on all the fields linked to 
it.

B. Tags at block level or lower (eg. <p> <c> <field> - this looks consistent with Star 
Office, I'm not sure about MSWord) would be bookmarkable but would not be namespaced. 
When a bookmark was created it would first check with the document level BookmarkList 
that it was unique. The forceNewBookmark would generate a new unique name when it was 
not possible to prompt the user eg pasting in from another document. As for 
DocumentAttributes, Fields would be able to register themselves with the bookmark and 
would be told to update whenever the bookmark changed.

C. All fields would have access to their page number etc in the document from suitable 
pointers so no additional classes would be needed for positional classes.

D. The name of a sequence (eg. Figure, Equation, Table) would need to be held at a 
document level, but need only contain a pointer to the first such field. When updating 
a linked list should be sufficient. When inserting it would be necessary to iterate 
through the sequence to locate the position, but this method seems to used in other 
places in the code and so shouldn't be too much of a problem. 

E. Possibly TOCs etc should be a new type of tag similar to a section as they will 
probably contain blocks within them. This could be implemented by keeping blocks of a 
given style (and lower levels) in a linked list, it will rather depend on how heading 
numbering is implemented. Some of the tables will be based on sequence fields, and I 
have already described how they could be linked together.


I'm sure there are some other fields which may not fit into this scheme, but possibly 
they are better left to a future scripting capability.

Finally, some intial thoughts on the new file format for some basic fields is also 
attached. It is loosely based on Justin Bradford's original posting. 
I havn't included any "id" type attributes which would be invisible to the user, as 
some have suggested. This is because of the impact it would have if a document got its 
ids corrupted. If ids are in the file format they can only be corrected by manual 
editing of the file once corrupted. However, if they are generated in code on opening 
the file then there is a good chance they will be sorted out the next time the file is 
openned. I suspect this is the type of problem which causes Word to count Figures and 
Headings randomly whenever a document gets big and close to its deadline!


I hope the length of this isn't too much for my first message to the list!

I obviously don't have a full understanding of the document data and layout mechanism 
so I can well believe that there are issues I have completely failed to consider in 
this. It is clearly important to achieve a consensus on these design issues since I am 
suggesting quite major additions to the class structure of AbiWord.

cheers,

Keith

I'm assuming a general format of:
<field type="" name="" format="" options="" > </field>

Commented fields would be implemented much later - though I'doubt the rest of them 
will be implemented very quickly either ;-)

Fields independent of position within document

<field type="author" options="full|initials"> </field>
<field type="comments"> </field>
<!-- <field type="docproperty"> </field> -->
<field type="filename" options="relative|full|noext"> </field> 
<field type="filesize"> </field>
<field type="filetime" options=""> </field>
<!-- <field type="template"> </field> -->
<field type="lastsavedby"> </field>
<field type="numchars" [format="ff"]> </field>
<field type="numwords" [format="ff"]> </field>
<field type="numpages" [format="ff"]> </field>
<field type="numparagraphs" [format="ff"]> </field>
<!-- <field type="revnum" [format="ff"]> </field> -->
<field type="subject"> </field>
<field type="title"> </field>
<field type="keywords"> </field>


User Info - constant for whole instance of AbiWord

<field type="useraddress"> </field>
<field type="userinitials"> </field>
<field type="username"> </field>

Fields relating to position in document

<field type="pagenumber" format="ff"> </field>
<field type="section" format="ff"> </field>
<field type="sectionpages" format="ff"> </field>

Fields dependant on bookmarks

<field type="pageref" refname="bookmark" format="ff"> </field>
<field type="ref" refname="bookmark"> </field>
<!-- <field type="chapterref" refname=""> </field> -->

Sequence fields

<field type="seq" name="" [bookmark=] format="ff">

List/Heading/In paragraph numbering?

<field type="autonumber" format="">
<field type="listnumber" format="">

Misc
<field type="time" format=""> to replace <field type="time"/> 
I'm sceptical about this, wouldn't file time or created time be more useful? This is 
effectively asking to be a clock embedded within your document!


I'm not at all sure about the next one
<field type="toc" styles="style1,style2..." sequences="figures;tables">

The time format could consist of [ .,:/-] as is plus the following:

Sample time 13:40:05 9 February 2000
day of month
D  = 9
DD = 09
DDD = 9th
W = Wed        } Star office uses N!
WW = Wednesday }
M = 2
MM = 02
MMM = Feb
MMMM = February
YY = 00
YYYY = 2000
mm = 40
ss = 05
hh = 13
hh ap = 1 pm } Star office uses AM/PM 
hh AP = 1 PM

eg <field type="filetime" options="hh:mm ap WW MMM DDD, YYYY">
1:40 pm Wednesday Feb 9th, 2000 

I'm assuming ff would be similar to that used for lists
ie (using 11 as an example value)

%i xi
%I IX
%a k
%A K
%h 0xa ie hex 0-9a-f a gimmick for us programmers?
%H 0xA ie hex 0-9A-F ?
%* whatever the format of the previous field was - probably only applicable for 
sequences.

For reference the current fields seem to be:
<field type="list-label"/> 
<field type="time"/> 
<field type="page_number"/>
<field type="page_count"/>

UML class diagram



-- 
Keith Stribley          http://www.stribley.dabsol.co.uk/

Some thoughts on changing the underlying field code

Reply via email to