> On 23 May 2015, at 6:36 am, Gabriela Gibson <gabriela.gib...@gmail.com> wrote: > > Hi, > > Well, I managed to get (rudimentary) headers, tables, lists working, but > the bold, italic and underlined nodes have me confused, mostly because > nothing appears in the order I expect it to. I used the > file sample/documents/odf/bold-italic-underlined.odt for that part.
I haven’t yet gone and run the code, but looking at the approaches used in traverseContent() where it calls find_HTML() to determine the corresponding HTML tag for a given ODF node, I don’t think this is going to work for a lot of the constructs in ODF. The reason is that it’s not always a simple mapping - for some basic constructs like paragraphs and (some) tables it will work, but there will be other cases where more complex processing is needed. So I think, at least for the time being (the very interesting DSL ideas we’ve been discussing notwithstanding), a first cut that has one big switch statement for all the supported node types is more likely to be successful. This way, you can do any arbitrary processing you need for a given node type, and are not restricted to simply mapping it to a particular HTML element. In terms of the formatting notes like those for italic and bold, I would suggest instead building up a set of CSS properties rather than creating HTML tags for <b>, <i>, and <u>. The reason for this is that there are only a few such tags in HTML, but there are many other formatting properties that can’t be expressed in this manner and instead use CSS. An <span> element with style=“font-weight: bold” attribute is equivalent to <b>, and there’s some code somewhere in the html directory which from memory I think converts between the two. So creating a CSSProperties object and setting the relevant name/value pairs in that will enable you to serialise the result and place that in a span tag. The other reason the CSS approach is more appropriate is that it can also be used for stylesheets. For automatic styles in ODF, we want to translate those to style=“…” attributes in HTML (that is, direct formatting, which is essentially what automatic styles are). However for normal styles, we want an entry in the CSS stylesheet, and then reference that from the HTML element via the class=“…” attribute. Have a look in the ooxml/src/word/formatting directly for how this is handled in the Word filter. This takes an XML node from the Word document as input, and populates a CSSProperty object with the appropriate values. There are also functions to go the other way, when performing an update. I would recommend an approach similar to this. Coming back to HTML_B and friends: I just had a look at HTMLNormalization.c and it looks like it only does this in the inverse situation to what I described above. That is, when reading a HTML file and preparing it for conversion into a Word document, it converts <b>, <i>, <u> etc into <span> tags with the appropriate CSS properties set in the style attribute. It doesn’t go the other way, though that could potentially be done. Both approaches are essentially identical anyway in terms of how they will render in a browser and be treated by the editor. > > I also had to do some surgery on DocFormats/core/src/xml/DFNameMap.* so I > could access DFNameMap. It isn’t actually necessary to put this stuff in the header - it’s best to keep the struct definition in the C file and only ever access it through the functions exposed in the header. If you’re not accessing any of the fields of DFNameMap (which you’re not, at least in the code currently in the repository), then the compiler simply needs to know that there exists a struct type called DFNameMap, without knowing what it’s fields actually are. The following line in DFNameMap.h declares the typedef: typedef struct DFNameMap DFNameMap; Everything you need to do with the name map can be achieved with the public functions - and in the event you find something that can’t be done, it’s better to either add a new function. Though this shouldn’t be necessary; if you find such situations let me know and I’ll explain how to do it with the existing functions :) — Dr Peter M. Kelly pmke...@apache.org PGP key: http://www.kellypmk.net/pgp-key <http://www.kellypmk.net/pgp-key> (fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966)