My view is we should “just do it”. If we wait until DocFormats is “stable” it’ll never get done, because it’s going to take me somewhere between a few weeks and a few months (depending on how my other commitments go) to fully solve the separation of change detection and change application, and I have lots of work to do on the editor as well (specifically, getting it into a state where it can be used in a web browser or embedded webview, outside the context of UX Write).
Similarly, design-by-committee style discussions on how to handle feature mapping/etc is only going to delay things even further, and be uninformed by implementation experience. If we start on the code now, we’ll both have at least some useful progress (even if it needs to be replaced or significantly refactored later), and we’ll learn things from the experience that will greatly help us in decision making. I suggest we start with following the current design of the Word filter. I want to try and document this design ASAP, but am trying to juggle this with other work. In particular I think there are some key conceptual issues I haven’t adequately explained properly about the library. Some relevant questions would be good to give me starting points for explanation/documentation. What I’ll do as soon as I get the chance (hopefully next week) is to lay out a skeleton of the code in the repository, and put TODO comments everywhere, with explanations and references to the corresponding code in the Word filter. This initial skeleton will give us the most basic form of conversion (text only, with no formatting or special objects), and also illustrate how non-supported features are handled (no support for tables yet? no problem - the update code just doesn’t touch them). I’ll try to incrementally document both the Word filter and the general design approach incrementally as I go along; and will do that on the mailing list in the hope of prompting questions and discussions about improvements or alternative designs. What I’d ultimately like to do - and this is a big thing - is to design a programming language specifically designed at implementing bidirectional transformations for tree-structured data, combined with built-in parsing capability that handles both XML and other arbitrary formal grammars. The parser will be based on PEG and Packrat parsing, and in terms of the transformation language I’m thinking something along the likes of Stratego/XT (check it out if you haven’t seen it yet; it’s utterly brilliant) but with static typing, similarly to Haskell. I anticipate this greatly simplifying the implementation of filters; but it’ll likely be a year or more before we have something usable on this and at that time we can rewrite filters to fit within this if appropriate. Also this is something which will require input from everyone on the project, which means lots of prototypes and discussion. So let’s not wait for stability. That’s going to take too long, possibly postponing any new filter development indefinitely. The necessary pieces are already in place for us to begin work on the ODF filter, and I think it’ll be much more practical to deal with problems as we encounter them rather than trying to get a perfect “base” beforehand. -- Dr. Peter M. Kelly [email protected] http://www.kellypmk.net/ PGP key: http://www.kellypmk.net/pgp-key <http://www.kellypmk.net/pgp-key> (fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966) > On 12 Feb 2015, at 12:45 am, Dennis E. Hamilton <[email protected]> > wrote: > > -- replying below to -- > From: Louis Suárez-Potts [mailto:[email protected]] > Sent: Wednesday, February 11, 2015 06:04 > To: [email protected]; Dennis E. Hamilton > Subject: Re: UnRTF Makes HTML > [ ... ] > > Perhaps we need to return, however, to our roadmap ambition. For instance, > what kind of plans do we have regarding ODF support? If we think it is time > to return to roadmap discussions, let’s start a new thread on that subject > and focus, yes? > > <orcmid> A start? > > I'm not so sure about plans, since it depends on where the developer effort > comes from, but I can see some definition happening. > > 1. I don't think the code base around DocFormats and the HTML in and out is > quite stable yet. Let's assume it is declared stable enough with acceptable > interface/architectural boundaries. > > 2. Then we know that there needs to be an ODF access component and an ODF > edit/create component. > > 3. With regard to feature support, there needs to be an agreement on how > features not supported through Corinthia are to be dealt with. There are two > cases - features that cannot round-trip safely through the HTML, an features > that are not even mapped to or from the HTML. This is an iterative cycle. > > 4. Presumably, the feature set at the HTML level for editing of OOXML should > be the target at any point for ODF also. So we know what the HTML case is > and have the equivalent ODF features target those cases should be a feasible > way of tracking with the evolution of Corinthia and DocFormats. > > 5. I don't know if there is any source-target capability intended. That is, > ODF -> HTML -> OOXML and vice versa. That makes for nice testing cases > though. It may serve the other Corinthia effort that is not being discussed, > the profiling of document-format provisions in implemented documents. > > Is this enough to get the ball rolling? > > </orcmid> >
