Re: [dev] Re: Grand Concept, splitting up the monolith, dynamic content

Yegor Jbanov Fri, 26 Sep 2008 22:23:04 -0700

Thanks, Mathias, for your time and such detailed response. See my
comments below.

2008/9/26 Mathias Bauer <[EMAIL PROTECTED]>:
> Yegor Jbanov wrote:
>
>> Amen to that!
>>
>> Plus:
>>
>> 1. Documentation is the suckiest thing about OpenOffice.org SDKs and
>> APIs. If OOo has been modularized, then nobody ever noticed.
> May be because it's always easier to maintain prejudices than to
> actually check them against reality from time to time. I don't say that
> OOo is perfectly modularized, but it's also far away from being a
> monolith. As I explained in my reply to Rene IMHO people tend to think
> that just because one particular aspect of modularity is not visible
> that their can't be any kind of it. And that's not true.

Maybe we use different definitions of "module". My definition is close
to one from Wikipedia (http://en.wikipedia.org/wiki/Module). This is
how they define it in Maven, IntelliJ IDEA and Eclipse (although in
Eclipse they call it "project"). You can already tell I'm a Java guy.
Anyway, the core elements are: well-defined interface, well-defined
dependencies, interchangeability, self-contained component. I am not
arguing that the code might already be well-organized so that it looks
like it's modularized. At the minimum a module must have a name, a
separate source code tree, separate unit tests, separate
documentation, separate download and a separate build script. Modules
that depend on other modules must explicitly declare their
dependencies.

Pretty much any well-written source code in the object-oriented world
that does not have circular dependencies among classes and namespaces
(packages in Java) is implicitly "modularized", because
class/namespace references can be presented as a tree, and each
subtree can be viewed as a module. A subtree can be compiled and
unit-tested independently of the rest of the source code. This is
good, because it makes the code base more readable and easier to
maintain. As a result it is easier to maintain higher quality and
richer feature set. For small projects this is usually enough. For
projects of such massive scale as OOo, however, it is definitely not
enough. I would say it is desperately not enough. Today office
automation faces much more diverse and complex requirements. A single
office suite does not meet these requirements. This is even true for
MS Office. That is why they supplement their suite with Sharepoint and
I am sure more is coming. OpenOffice.org is missing a server-side
component like that. And that is not the only example. The age of
cloud-computing is looming and OOo is not prepared for that (Google
Docs, Salesforce, Zoho, etc).

>
>> Is there
>> a module dependency diagram anywhere? (This was a rhetorical question,
>> of course.) We need a central, comprehensive, well-organized and
>> up-to-date documentation web-site (take a look at how Google documents
>> their toolkits). The official documentation is so badly
>> organized/out-of-date/incorrect/incomplete that even Google fails to
>> find relevant information. My major source has been so far the mail
>> archive where people report their questions and sometimes get their
>> answers, yet nobody ever cares to update the documentation so that
>> others could find it easily.
>
> I agree that our documentation needs improvement (you could volunteer to
> help). But with some good will you can find a lot of interesting things
> in the Developer's Guide. It e.g. explains how the application framework
> of OOo works and you can indeed see this as a documentation of the
> modular structure of OOo.

This is where modularization could help a lot. Having documentation
for each module will make sure that we don't have any big holes. I
agree that you can find a lot of interesting things in the guide. This
does not mean however that we have documentation we could build with.
It has to cover every piece of functionality that is designed to be
used by extension developers, core developers, and developers of
applications importing a sub-set of OOo modules as external libraries.
Things that are not yet documented could at least point into the
source code where a skilled programmer could understand the concepts
by reading the code. By the way I find links to specs completely
useless and confusing. Most of these specs seem so far away from
reality. They look more like internal Sun documents that used to be
drafts of the functionality that was going to be implemented but was
implemented completely differently.

Let's try a little experiment. This page
(http://wiki.services.openoffice.org/wiki/Extensions_best_practices)
claims that UNO AWT has a very high priority. Now, try searching for
"UNO AWT" in Google and see what you get.

I develop extensions for OOo and did have some occasions where I had
to find things out by trial-and-error and I probably can help
documenting some things.

>
> There are parts of OOo that lack modularization, but even where the
> modularization is missing on package or library level there may be clear
> architecture on the code level.

Again, this helps the core developers to maintain code. But until you
can build a module as a standalone shared library or link it into your
own code base, this kind of "modularization" only accomplishes 5% of
what it could otherwise.

>
> The new chart component that we added in OOo2.3 is a good example for
> what is there and how it can be used. All three parts of this
> "application" (model, view and controller code) are in a separate
> library. And there are no dependencies of the Framework on any of these
> libraries, objects from these libraries are instantiated as UNO
> services. You can remove Chart from the installation without breaking
> anything - except sloppy written code that expects that "their always is
> a Chart". But this is not a problem of bad modularization or
> architecture, that's just a bug.

With proper modules, you wouldn't even have a chance to accidentally
introduce such a bug. The code wouldn't compile due to broken
dependencies.

>
> The same separation of application and framework BTW is true for Writer,
> Calc etc. also, thanks to the very modular and abstract design of the
> framework. But admittedly the *internal* structure of these "modules"
> lacks modularization. Currently only the dialogs of e.g. Writer are in a
> separate libary. My idea is to extend that to the whole UI code
> somewhere in the future. But this is quite some work to do and we can't
> take ourselves out of the ongoing development for a year or so, so we
> have to work on modularization along the way.
>
> The biggest problem in this area still are our Drawing Layer, the
> EditEngine/Outliner and the forms layer that together totally undermine
> any attempt to implement a model/view/controller separation in Writer (I
> can't speak for Calc and Draw/Impress here). But I know that a very
> motivated developer is trying to fix that even if it costs him several
> years of his life time. ;-)

I hear you, and I understand your concern about the impact on the
development of the core functionality. I am trying to add another
dimension to the discussion of the modularity. Namely, from the point
of view of an external developer, like myself. If only I had a chance
to use these libraries without having to build/install the whole
office suite, I could help a lot with the development as I am very
interested in improving the quality OOo in general, but I can only
help with a small portion of it. I only have maybe 10% of my time that
I can dedicate to this. Right now I need a month of full-time
immersion to understand what OOo is made of internally. Now think
about hundreds of other developers who wish to help but can't, then
multiply by 10%. You don't have to do the math to realize that that's
a whole bunch of extra enthusiastic man hours that could be poured
into OOo.

>
>> 2. OOo file filters must become a standalone project that could be
>> shared with KOffice, AbiWord and others. In general, having ability to
>> use filters outside OOo is a major advantage.
> Are you sure that you know how filters work? They don't convert from
> format A to format B, they convert from a format to an API or "core

I understand some of it. There are import filters that convert from
format A to OOo internal DOM which resembles ODF a bit, but it's not,
and it's usually in memory. Export filters do the reverse, they
convert internal DOM to format B (where B may or may not be equal A).

> model". As long the applications don't share the core model and the API
> they never will be able to share the filter code.

Well, this is the law, isn't it? How do you expect me to use an
external library and not adapt to its core model and API? :) Instead,
I am eager to use your APIs and models, just give me a chance. And if
I find a bug I'll report it, or better, fix it and send you a patch.
What I really want, is that I take a file in any format and convert it
(or import it) into a single format I can rely on and then apply my
own business logic to it. If it is some core OOo object model, I am ok
with that. What I want after that is take this model and save it
(export) to some format of choice. Or if I am generating reports and I
want to support multiple formats, I would create that model in memory
from some data and then export it using OOo filters module. All I want
that module to do is bundle a set of filters for formats that I want
to support, provide type detection capabilities and provide access to
the DOM. I do not need any UI around it.

>
> Of course you can share parts of the filters, e.g. as in case of the
> libwpd that converts the imported format to a somewhat idealized model.
> But you always will need some code around it that adapts this to the
> concrete model of the application you want to import to.
>
> Here's an example: our new docx import filter consists of three
> components. One is the parser/tokenizer component that scans the file
> and generates kind of events that make up an idealized and very
> low-level model. Another component, the so called "domain mapper"
> converts this into API calls using the API of the document core model.
> The API builds up a still idealized but already very concrete model. The
> implementation of this API can be seen as the third part and it can
> adapt from the still idealized API view to the very bits and bytes of
> the C++ source code. As the three parts talk to each other through
> defined and stable interfaces basically each part can be exchanged by
> another implementation. How much more modularization do you want to have?
>
> By far the most code is in the latter part of the filter (the API
> implementation) and of course this one can't be shared with other
> applications as this would require that they use the same internal
> implementation. But even the next big component, the Domain Mapper, is
> not easily shareable, as this would require that the applications shared
> the component model (OOo uses UNO) and the API based on it. But the
> inability to share these parts is not caused by missing modularization,
> I hope this has become clear from my description.
>
> So what you can share is the scanner/tokenizer, if you are willing to
> plug it into the code of your application. This is only a small part of
> the filter, but it's possible. You can try. :-)
>
> I never investigated the code of the Word Perfect import filter, but
> IIRC the libwpd also can be seen as the scanner/tokenizer part of the
> filter that can be shared between applications.
>
>> There are so many
>> use-cases for filters other than opening Word files for editing in
>> OOo. Content management systems (Alfresco), reporting software
>> (JFreeReports), document intelligence (redaction), web-office suites
>> (Zoho, GDocs), etc, all need multi-format support.
> You are not talking about filters but about converters. A converter is a
> shortcut between an import and an export filter. It's not necessary to
> share converters with other applications on code or module level, they
> are standalone-applications as they communicate with other apps through
> files, not through code.

No, as explained above, I need filters.

>
> I assume that you want to have such a standalone-application based on
> the OOo filters. I agree that this would be fine.
>
> But even these convertes will need to contain the core model of the
> application their filters are based on. And this will pull in a major
> part of OOo, regardless if it's modularized or not.

That is if that application has any filters. Maybe all it wants to do
is use OOo filters. Then again, I agree that those project that use
OOo as external dependency have to use OOo model.

>
> Admittedly currently this will pull in even some unnecessary code, as
> e.g. the UI code of Writer that surely isn't needed in a converter. And
> this is one reason why I would like to separate all UI code from the
> core model code so that we could create a converter that does not

I applaud to that! Thank you very much! I believe this will bring a
lot of good. This is another step towards modularization.

> contain it. But I don't see any reason to go further with modularization
> and e.g. split up the core into modules as this would be effort with a
> small advantage but several disadvantages. One is a possible performance
> penalty caused by additional interface layers and another one is a more
> basic consideration.

Interfaces will be resolved by the linker and avoid any performance
penalties. I do not suggest that when you build OOo as a complete
Office Suite you should keep modules separate as the output. Feel free
to link them together to maximize the performance. I am saying that
modules should ALSO be distributed separately for others to use. Also,
let's remember that premature optimization can actually worsen the
performance, especially when dealing with such complex project as OOo.

>
> IMHO all code that is needed to work with the application's feature set
> is mandatory code and it must be part of even the smallest possible
> converter or any other application you want to build on OOo's
> capabilities (that are themselves based on ODF).

You are being too restrictive here. Proper separation of concerns will
easily reveal orthogonal features and allow standalone usage.

>
> Whether this mandatory code is modularized internally or not is
> completely irrelevant for the converter - it may be advantegeous to make
> larger, rarely used features loadable on demand (e.g. to speed up the
> startup of OOo), but at least they must be part of the whole set. I
> never would like to see any code associated with OOo that would not be
> able to deal with ODF in its completeness.

I am not suggesting that any ODF functionality (or internal DOM
functionality) should be split into modules, but at least it should
not include any UI dependencies and should itself be a module. Here is
an example of how modules could be organized:

"Type Detection Module" uses one-or-more "Filter Modules" uses "ODF
(DOM) Toolkit"

So, here "ODF (DOM) Toolkit" has no dependencies on filters or type
detection. It can therefore be a module. I should be able to build it
(better download it) and use it alone to manipulate ODF documents.
"Filter Modules" bring support for other formats by converting them
to/from ODF. "Filter Modules" depend on "ODF (DOM) Toolkit". If I ever
want to support one format or can detect formats myself I will
download a set of "Filter Modules" with "ODF (DOM) Toolkit" and use
them. Now, if I am dealing with random files (as OOo does) and I need
an advanced type detection system, I will download the "Type Detection
Module" with everything else and use it. So many choices and I still
don't need anything else that the full OOo suite might have. I have
tons of deployment options: bundled with another GUI program, deployed
to the server, etc. Moreover, I can test these modules separately and
report/fix bugs without worrying about the rest of the OOo modules.

>
> Ciao,
> Mathias
>
> --
> Mathias Bauer (mba) - Project Lead OpenOffice.org Writer
> OpenOffice.org Engineering at Sun: http://blogs.sun.com/GullFOSS
> Please don't reply to "[EMAIL PROTECTED]".
> I use it for the OOo lists and only rarely read other mails sent to it.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

Thanks,

Yegor

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [dev] Re: Grand Concept, splitting up the monolith, dynamic content

Reply via email to