I've been thinking about doing something on fields for awhile, so here are my ideas.
It seems to have been agreed that a field container is going to be used which means
changes to the importer which currently assumes that Objects don't have any content.
What should fields be represented as at the PieceTable level? Should we just add
additional code to an object, is a new type of Strux more appropriate or even a
completly new piecetable type? I'm struggling to get a grasp of all the issues
involved, so would welcome some others looking at this.
Also, as I see it a single field may well need to consist of several runs (or have I
misunderstand the code). Take for example a field with several words of text such as
the author of a document, we want to allow a line to wrap in the middle of such a
field. Clearly, the contents of a field needs to be represented by runs which are
distinguishable from editable text. As I see it at least 2 possibilities exist:
1. Continue having specific fp_FieldRuns but modify them such that any one field is
represented by a linked list of such runs to represent its content. The line breaking
code would need to be implemented specifically for these FieldRuns.
2. Add code to allow any of the existing runs to be a non-editable field run. All the
functionality for the current types of run, including line breaking etc., could then
be used by a field if desired.
As far as where to implement the field calculation code, I was thinking of a completly
new class structure which would be linked in at the document and piecetable fragment
level. This is outlined in more detail below, but to start with I tried to split
fields into categories from a functionality stand point.
A. Fields independent of position within document
- although their value would probably be changing as the document was edited, the
value displayed would be independent of where the field was inserted, but needs to be
updated every time it changes.
B. Reference fields which are linked to a bookmark of some kind
- These are also independent of their own position in the document, but potentially
need to be updated every time their bookmark changes/moves.
C. Fields relating to position in document
- This information is already available in current field implementation probably no
need for any helper classes. Every time the field moves in the document they need to
be updated.
D. Sequence fields eg for Figure numbering
- These are dependent on their position relative to other sequence fields with the
same sequence name. These will change every time a field of the same type above them
in the document is added or removed.
E. Tables of contents etc
- These change continuously with updating of the document and may need manual
updating to avoid performance penalties.
F. Other types
- logical, database fields etc - I'm assuming these won't be implemented for a long
time so haven't thought much about them.
It is clear that each of these types has different requirements for updating. I'm sure
there any many possible implementations but here is what I have come up with which I
hope might at least be a basis of discussion. The attached pdf file contains a UML
Class diagram (done in Dia) which hopefully clarifies the discussion. (The attributes
and operations in the classes are not supposed to be exhaustive, just to give an idea
of functionality. I also suspect their names need changing a bit).
A. Document wide information/attributes could be contained in a list at a document
level. Any field which used an attribute would link to that specific attribute.
Whenever that attribute was changed it would call update on all the fields linked to
it.
B. Tags at block level or lower (eg. <p> <c> <field> - this looks consistent with Star
Office, I'm not sure about MSWord) would be bookmarkable but would not be namespaced.
When a bookmark was created it would first check with the document level BookmarkList
that it was unique. The forceNewBookmark would generate a new unique name when it was
not possible to prompt the user eg pasting in from another document. As for
DocumentAttributes, Fields would be able to register themselves with the bookmark and
would be told to update whenever the bookmark changed.
C. All fields would have access to their page number etc in the document from suitable
pointers so no additional classes would be needed for positional classes.
D. The name of a sequence (eg. Figure, Equation, Table) would need to be held at a
document level, but need only contain a pointer to the first such field. When updating
a linked list should be sufficient. When inserting it would be necessary to iterate
through the sequence to locate the position, but this method seems to used in other
places in the code and so shouldn't be too much of a problem.
E. Possibly TOCs etc should be a new type of tag similar to a section as they will
probably contain blocks within them. This could be implemented by keeping blocks of a
given style (and lower levels) in a linked list, it will rather depend on how heading
numbering is implemented. Some of the tables will be based on sequence fields, and I
have already described how they could be linked together.
I'm sure there are some other fields which may not fit into this scheme, but possibly
they are better left to a future scripting capability.
Finally, some intial thoughts on the new file format for some basic fields is also
attached. It is loosely based on Justin Bradford's original posting.
I havn't included any "id" type attributes which would be invisible to the user, as
some have suggested. This is because of the impact it would have if a document got its
ids corrupted. If ids are in the file format they can only be corrected by manual
editing of the file once corrupted. However, if they are generated in code on opening
the file then there is a good chance they will be sorted out the next time the file is
openned. I suspect this is the type of problem which causes Word to count Figures and
Headings randomly whenever a document gets big and close to its deadline!
I hope the length of this isn't too much for my first message to the list!
I obviously don't have a full understanding of the document data and layout mechanism
so I can well believe that there are issues I have completely failed to consider in
this. It is clearly important to achieve a consensus on these design issues since I am
suggesting quite major additions to the class structure of AbiWord.
cheers,
Keith
I'm assuming a general format of:
<field type="" name="" format="" options="" > </field>
Commented fields would be implemented much later - though I'doubt the rest of them
will be implemented very quickly either ;-)
Fields independent of position within document
<field type="author" options="full|initials"> </field>
<field type="comments"> </field>
<!-- <field type="docproperty"> </field> -->
<field type="filename" options="relative|full|noext"> </field>
<field type="filesize"> </field>
<field type="filetime" options=""> </field>
<!-- <field type="template"> </field> -->
<field type="lastsavedby"> </field>
<field type="numchars" [format="ff"]> </field>
<field type="numwords" [format="ff"]> </field>
<field type="numpages" [format="ff"]> </field>
<field type="numparagraphs" [format="ff"]> </field>
<!-- <field type="revnum" [format="ff"]> </field> -->
<field type="subject"> </field>
<field type="title"> </field>
<field type="keywords"> </field>
User Info - constant for whole instance of AbiWord
<field type="useraddress"> </field>
<field type="userinitials"> </field>
<field type="username"> </field>
Fields relating to position in document
<field type="pagenumber" format="ff"> </field>
<field type="section" format="ff"> </field>
<field type="sectionpages" format="ff"> </field>
Fields dependant on bookmarks
<field type="pageref" refname="bookmark" format="ff"> </field>
<field type="ref" refname="bookmark"> </field>
<!-- <field type="chapterref" refname=""> </field> -->
Sequence fields
<field type="seq" name="" [bookmark=] format="ff">
List/Heading/In paragraph numbering?
<field type="autonumber" format="">
<field type="listnumber" format="">
Misc
<field type="time" format=""> to replace <field type="time"/>
I'm sceptical about this, wouldn't file time or created time be more useful? This is
effectively asking to be a clock embedded within your document!
I'm not at all sure about the next one
<field type="toc" styles="style1,style2..." sequences="figures;tables">
The time format could consist of [ .,:/-] as is plus the following:
Sample time 13:40:05 9 February 2000
day of month
D = 9
DD = 09
DDD = 9th
W = Wed } Star office uses N!
WW = Wednesday }
M = 2
MM = 02
MMM = Feb
MMMM = February
YY = 00
YYYY = 2000
mm = 40
ss = 05
hh = 13
hh ap = 1 pm } Star office uses AM/PM
hh AP = 1 PM
eg <field type="filetime" options="hh:mm ap WW MMM DDD, YYYY">
1:40 pm Wednesday Feb 9th, 2000
I'm assuming ff would be similar to that used for lists
ie (using 11 as an example value)
%i xi
%I IX
%a k
%A K
%h 0xa ie hex 0-9a-f a gimmick for us programmers?
%H 0xA ie hex 0-9A-F ?
%* whatever the format of the previous field was - probably only applicable for
sequences.
For reference the current fields seem to be:
<field type="list-label"/>
<field type="time"/>
<field type="page_number"/>
<field type="page_count"/>
UML class diagram
--
Keith Stribley http://www.stribley.dabsol.co.uk/