Hi Sam, *,

please forward this also to the apache-list where I'm not subscribed
(I suggest only Sam does, in order to prevent 50 people forwarding the
very same mail :-D)

On Sun, Jun 5, 2011 at 1:32 AM, Sam Ruby <ru...@apache.org> wrote:
> On Sat, Jun 4, 2011 at 7:03 PM, Christian Lohmaier
> <lohmaier+ooofut...@googlemail.com> wrote:
>> As far as I know, there is only the "intent" of Oracle to
>> donate it unter the Apache License, but no clear statement has been
>> made as to what exact sourcecode this will cover.
> The ASF has a signed software grant with a specific list of source files.
>> It's not even clear whether it will be the current codebase or some
>> older version IBM is basing their version on.
> It is the codebase on openoffice.org.  The intent is to move the full
> version history.  The mechanics of this have yet to be worked out.

As on the apache list, a link to that "list of source files" has been
provided, and there have been claims that this list is covering the
whole source, I had a deeper look myself.

1st of all: It doesn't any history-data/mercurial database files, so
how this point is covered is not clear to me at all, but on to my
analysis of the Oracle provided filelist that was made available here:

1st observation: Some filepaths are split. The lines are split
at various line-length, and not at "word limits" like the dot for the
filename extension or the slash that delimits directorys, but in
middle of the string, see http://libreoffice.pastebin.ca/2075460 for a
patch to fix those

2nd observation: The file is not sorted alphabetically (at least
differs from sort output/what comm tool that is later used expects, so
sort it:
sort openoffice.files.txt > sorted_ooo.lst

In order to do the comparison, clone the current repo
hg clone http://hg.services.openoffice.org/DEV300/

and create a filelist, excluding the repository's data
find DEV300/ -type f -not -path 'DEV300/.hg/*' | cut -c 8- | sort > repo.lst

raw numbers:
wc -l repo.lst sorted_ooo.lst
 69076 repo.lst
 39616 sorted_ooo.lst

So even calling this "seems to include the full repo" and that even
twice is either with malicious intent, or with no clue. Christian
Lippka really should know better, but had stated this at least twice.
Close to 30000 files gone, who cares "source seems complete"..

Now to interesting numbers:
Files in the Oracle's list, but not in the repo-list (= files most
likely moved by refactoring the code (gbuildification of modules and
similar) = indication of when the snapshot was taken):
comm -1 -3 repo.lst sorted_ooo.lst  |wc -l
$ 455

digging in hg's history shows that the snapshot of the sources must
have been taken before 2011-03-21 - as those files were [re]moved in the
following cws:
276288  2011-03-21      CWS-TOOLING: integrate CWS dr78
276552  2011-03-29      CWS-TOOLING: integrate CWS ka102
276583  2011-03-29      CWS-TOOLING: integrate CWS vcl2gnumake
276711  2011-04-01      CWS-TOOLING: integrate CWS solaris11
276673  2011-04-01      CWS-TOOLING: integrate CWS calcvba
276692  2011-04-01      CWS-TOOLING: integrate CWS mav60

So while one can clearly say that those are not part of the sources,
and hence the code is at most in the state of m103 (but of course that
doesn't exclude that the codebase can be older than that) The changes of at
least 27 CWS (+3 masterfix ones) that have been integrated into OOo
code in the meantime are definitely missing.

Files in repo, but not in Oracle's list:
$ comm -2 -3 repo.lst sorted_ooo.lst  |wc -l

sdf files = translation files: Those are not included in either repos,
the sdf files that are in the repo are for testcases/gsicheck, the translations
have been split to a seperate repository

So those don't even account to the difference!
$ grep -c sdf$ repo.lst sorted_ooo.lst

Image files = binary files
egrep -c '(bmp|png|gif|jpe?g)$' repo.lst sorted_ooo.lst

So this is one big chunk, all toolbar icons for the different themes,
cursors, artwork for the installers, etc.

But what are the remaining 17563 files? shell-fu will give a hint:
$ comm -2 -3 repo.lst sorted_ooo.lst  | egrep -v
'(bmp|png|gif|jpe?g)$' | sed -n -e 's/.*\.\([^./]*\)$/\1/p' | sort |
uniq -c | sort -rn | head
  1716 ott
  1329 xml
  1140 xlb
   813 xcu
   749 cfg
   710 csv
   588 txt
   555 h
   472 css
   459 java

OK, the user will not get any templates either, too bad, but the next
ones are interesting. No configuration schemes, no configuration data
Let's have a closer look:
$ comm -2 -3 repo.lst sorted_ooo.lst  | grep xcu$ | awk -F/ '{print
$1}' |sort |uniq -c
    32 dictionaries
     4 extensions
   716 filter
     3 lingucomponent
     2 mysqlc
    21 odk
    16 officecfg
     1 pyuno
     3 scripting
     7 sdext
     5 sfx2
     3 testautomation

Want to load documents? Too bad, Apache won't know about the filters.
Want to save? Hah, that 's a good one, apache-OOo doesn't know about
export filters either.

Spellchecking? ha, dream on… (but that is understandable, as
dictionaries are mostly third-party stuff, so that one is excused)

Let alone the other binary files (various OOo documents, also some
MS-Office documents, the palettes, icon/wav (for gallery) the
interesting ones include:

Tons of xml
comm -2 -3 repo.lst sorted_ooo.lst  | grep xml$ | awk -F/ '{print $1}'
|sort |uniq -c |sort -nr | head
   235 sw
   201 i18npool
   154 sc
   129 sd
   112 testautomation
    64 dictionaries
    51 toolkit
    45 desktop
    34 scripting
    29 svx
Didn't look into that closer, but
$ comm -2 -3 repo.lst sorted_ooo.lst  | grep xml$ | grep toolbar |wc -l

So want to use toolbar buttons? Too bad, the corresponding definitions
are not included, you won't get any/most toolbars. Good luck starting
from scratch defining your own.

But let alone those boring "non-code" stuff.
134 patches missing (for the external modules) (Ok, that's arguable,
as the external modules won't be part of apache-OOo in the long run

You want to actually build this thing? Well, too bad - the build.lst
files that define the inter-module & directory dependencies, and the
d.lst files that list the module' files to be exported for use by
other modules are not included either:

$ grep -c d.lst repo.lst sorted_ooo.lst

similar: 302 *.mk files that are only in the repo, amongst them the
solenv//inc/_tg_*.mk ones, the templates that define the very basic
target rules used throughout the build (and that are expanded by
mkunroll to produce the makefiles that are then included by the actual

So with this snapshot, Apache-OOo is far from being able to deliver
something that is even close to OOo.as it is now. It is missing all
translations, all artwork, build-dependency definitions that are
absolutely needed for doing a build, no toolbar-definitions, no
Apart from the systematic omission of images, random source-files are
missing as well, probably because they don't carry the default copyright
header, for example binfilter/inc/bf_svx/svxslots.hxx

So calling this list "complete" or stating something along the lines
of "looks like a straight dump from hg" is a joke.

So Oracle definitely needs to revise that list, and include at least
the translations, the artwork, the configuration data/xml-files, the
randomly omitted files, etc. And while they're on it, they could base
their list on the current m106 milestone.


Unsubscribe instructions: E-mail to discuss+h...@documentfoundation.org
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.documentfoundation.org/www/discuss/
All messages sent to this list will be publicly archived and cannot be deleted

Reply via email to