I'm currently using LibreOffice for word processing, but I also use markdown, 
LaTeX and Scribus for related tasks.

As a former WordPerfect 5.1 hacker who continues to mourn the emergence of 
WYSIWYG, Windows and Microsoft Word, I found myself examining the internals of 
an .odt file.  

https://docs.oasis-open.org/office/OpenDocument/v1.3/os/part3-schema/OpenDocument-v1.3-os-part3-schema.pdf

I wondered whether it would be possible to reduce the blizzard of redundant 
tags and other noise in these files (see extract below), something that most 
basic markup (eg: markdown) editors accomplish.

In particular, could the document editor elide empty tags, such as <text:p 
text:style-name="P10"/>?  Also, would it be possible to normalise and merge 
contiguous text elements with the same style.

I've considered writing an XSLT to accomplish some of this, and maybe other XML 
tools for navigating the .odt format.

I suspect stalling in the editor and incomprehensible formatting glitches 
(seemingly impervious to the "Clear Direct Formatting" command) are artifacts 
of the complexity of this bloated document model.

I admit I'm undisciplined in my use of styles, but I feel this should not be a 
barrier to using LibreOffice.  I imagine some of the redundancy is related to 
the presence of  "Undo" stacks, etc, and there may be ways to accomplish some 
of these goals already (such as "Save As"?), but I'd appreciate any advice.

I'm also interested in methods to manage font definitions so that analogous 
fonts aren't included in documents by accident.  This could include some user 
intervention.  I suspect copy/paste with styling from other sources (eg: 
browsers) is the source of many of these issues.

Finally, is there any documentation describing the indirect style scheme used 
in the content/style models, such as the 'P2' in <text:p text:style-name="P2"/>?

I guess what I'm after is a way to directly manage fonts and styles that 
defaults to an empty set and is then parsimonious in the creation and 
application of either.  It would be nice to have a way to manage styles using a 
configuration script, without manual interaction with the "Manage Styles" 
dialogue.

There may be pythonic solutions to these issues, and that's something I haven't 
explored.  I'm currently running LibreOffice as an AppImage on Ubuntu (I 
despise snap) and I'm not sure how to use the LibreOffice python interpreter in 
that setup.

Feedback on any of this would be appreciated!

Cheers,
Jono


Example fragment from content.xml:
====================================================
<text:p text:style-name="P119">
Gzxt Hbnse
<text:span text:style-name="T52">s</text:span>
</text:p>
<text:p text:style-name="P2"/>
<text:p text:style-name="P10">
_________________________________
<text:span text:style-name="T75">title</text:span>
</text:p>
<text:p text:style-name="P10"/>
<text:p text:style-name="P10"/>
<text:p text:style-name="P103"/>
<text:p text:style-name="P44">
Jklkh jghj pljkweing with vbv hbnses.
<text:s/>
The
<text:span text:style-name="T131">assa </text:span>
dfd jghj hjhj.
</text:p>
<text:p text:style-name="P9"/>
<text:p text:style-name="P101">
<text:soft-page-break/>
====================================================

Reply via email to