Re: ODF branch: The confused edition.

Peter Kelly Sat, 23 May 2015 11:01:27 -0700

> On 23 May 2015, at 6:36 am, Gabriela Gibson <gabriela.gib...@gmail.com> wrote:
> 
> Hi,
> 
> Well, I managed to get (rudimentary) headers, tables, lists working, but
> the bold, italic and underlined nodes have me confused, mostly because
> nothing appears in the order I expect it to.  I used the
> file sample/documents/odf/bold-italic-underlined.odt for that part.


I haven’t yet gone and run the code, but looking at the approaches used in 
traverseContent() where it calls find_HTML() to determine the corresponding 
HTML tag for a given ODF node, I don’t think this is going to work for a lot of 
the constructs in ODF. The reason is that it’s not always a simple mapping - 
for some basic constructs like paragraphs and (some) tables it will work, but 
there will be other cases where more complex processing is needed. So I think, 
at least for the time being (the very interesting DSL ideas we’ve been 
discussing notwithstanding), a first cut that has one big switch statement for 
all the supported node types is more likely to be successful. This way, you can 
do any arbitrary processing you need for a given node type, and are not 
restricted to simply mapping it to a particular HTML element.

In terms of the formatting notes like those for italic and bold, I would 
suggest instead building up a set of CSS properties rather than creating HTML 
tags for <b>, <i>, and <u>. The reason for this is that there are only a few 
such tags in HTML, but there are many other formatting properties that can’t be 
expressed in this manner and instead use CSS. An <span> element with 
style=“font-weight: bold” attribute is equivalent to <b>, and there’s some code 
somewhere in the html directory which from memory I think converts between the 
two. So creating a CSSProperties object and setting the relevant name/value 
pairs in that will enable you to serialise the result and place that in a span 
tag.

The other reason the CSS approach is more appropriate is that it can also be 
used for stylesheets. For automatic styles in ODF, we want to translate those 
to style=“…” attributes in HTML (that is, direct formatting, which is 
essentially what automatic styles are). However for normal styles, we want an 
entry in the CSS stylesheet, and then reference that from the HTML element via 
the class=“…” attribute.

Have a look in the ooxml/src/word/formatting directly for how this is handled 
in the Word filter. This takes an XML node from the Word document as input, and 
populates a CSSProperty object with the appropriate values. There are also 
functions to go the other way, when performing an update. I would recommend an 
approach similar to this.

Coming back to HTML_B and friends: I just had a look at HTMLNormalization.c and 
it looks like it only does this in the inverse situation to what I described 
above. That is, when reading a HTML file and preparing it for conversion into a 
Word document, it converts <b>, <i>, <u> etc into <span> tags with the 
appropriate CSS properties set in the style attribute. It doesn’t go the other 
way, though that could potentially be done. Both approaches are essentially 
identical anyway in terms of how they will render in a browser and be treated 
by the editor.

> 
> I also had to do some surgery on DocFormats/core/src/xml/DFNameMap.* so I
> could access DFNameMap.

It isn’t actually necessary to put this stuff in the header - it’s best to keep 
the struct definition in the C file and only ever access it through the 
functions exposed in the header. If you’re not accessing any of the fields of 
DFNameMap (which you’re not, at least in the code currently in the repository), 
then the compiler simply needs to know that there exists a struct type called 
DFNameMap, without knowing what it’s fields actually are. The following line in 
DFNameMap.h declares the typedef:

typedef struct DFNameMap DFNameMap;

Everything you need to do with the name map can be achieved with the public 
functions - and in the event you find something that can’t be done, it’s better 
to either add a new function. Though this shouldn’t be necessary; if you find 
such situations let me know and I’ll explain how to do it with the existing 
functions :)

—
Dr Peter M. Kelly
pmke...@apache.org

PGP key: http://www.kellypmk.net/pgp-key <http://www.kellypmk.net/pgp-key>
(fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966)

Re: ODF branch: The confused edition.

Reply via email to