I suppose a course estimate is also a coarse one [;<).
If we need to go beyond analyzing what is revealed in reports to the project,
there remains the prospect for instrumentation.
I am not certain that we have the resources to do that. So this is a
thought-experiment.
INSTRUMENTATION
There is no instrumentation of Apache OpenOffice at present. There is an
existing path to doing so. It already provides a crude measurement, if used.
There are ways that, with adjustment of the software, more useful data could be
obtained. Producing and capturing the information involves development work.
And any collection of such data must be kept anonymous, while recognizing data
from the same installation.
Instrumentation can require considerable work and large databases for the
captured information. There might not be sufficient capacity to undertake any
degree of instrumentation in the face of higher-priority needs.
The following note is a bit over-engineered. It is simpler if we do not need
to differentiate data sources at all, but that might not get us what we need.
How important is knowing what we could find out about usage patterns this way?
1. Privacy of Data Collection
It is possible to instrument the software to collect certain data, such as
the numbers and formats of files opened and saved-as since the previous
collection of data from a source. This requires additions to the software to
accumulate such information and to the servers receiving the request for
capturing the information.
Some data might need to be longitudinal, with data captured at different
times from the same source recognized and combined in some way. This allows
quite different patterns of usage to be distinguished and not lumped together
in a single mass, if that becomes important.
This means that the source of the data must be anonymized in some manner
that still allows data from the same copy of Apache OpenOffice to recognized,
but without recording of anything that allows the captured data to be traced
back to the originating source.
All of this involves substantial careful development. The means for
prevention of identifying sources must be carefully managed. It must also be
possible to protect the data collection procedure from exploits and denial of
service attacks.
2. Update Checking as a Data Source. When installed copies of Apache
OpenOffice conduct an automatic or manual check for updates, that is a source
of information. Unqualified, it is an indication that an installed copy of the
software is being used in some manner.
Update checks are only useful, however, if pings estimated to be from the
same installation are distinguishable. The crudest measure is simply the date
and time of the latest ping from the same (estimated) source, along with the
version of Apache OpenOffice being used. This could be captured without any
modification of the existing software package.
To distinguish sources, it may be necessary to keep a database with up to
50-100 million records that identifies information from each source without
revealing that source. The same principle is needed if additional data is
provided as part of check-for-update requests from the software.
Information in the currently-implemented HTTP request can be used to
estimate when requests are from the same source. To preserve anonymity of the
source, that information can be transformed into a cryptographic hash that
cannot be used to determine the original source but can be used to determine a
match with a previously-captured ping. This is a coarse arrangement.
3. Specific Instrumentation. If future releases were modified to collect and
report usage data (with appropriate opt-in as part of the configuration
set-up), that data could be attached to checks-for-updates when allowed. To
accumulate patterns over time, accumulation of data is best tied to user
profiles. By generating a statistically-unique cryptographically-random
identifier as part of each user profile that is initialized, that can be used
to recognize instrumentation from the same profile. When the data is
collected, the identifier is used in making the cryptographic hash in (2) and
then discarded.
> -----Original Message-----
> From: Dennis E. Hamilton [mailto:[email protected]]
> Sent: Sunday, November 22, 2015 11:50
> To: [email protected]
> Subject: [QUESTIONS] How Is Apache OpenOffice Used (was Apache
> OpenOffice ODF in the Marketplace ...)
>
> I have changed the topic because Marketplace is misleading -- the AOO
> Project is not so much a participant in a market system. Yet it is
> useful to determine who our public community is and what the adopters of
> Apache OpenOffice are doing with it.
>
> We have the statistics below as a course estimate of the size of the
> active AOO community, our public.
>
> The original question was, how important is ODF to those adopters?
>
> That's an answer that is more likely to be found by asking "What are the
> adopters doing with their copies of Apache OneOffice? In particular,
> what document formats are they using and to what relative degree?"
>
> We have no way to know that directly at the moment.
>
> There is one immediately-available source.
>
> REPORTS TO US
>
> What we know the most about what folks are doing with Apache OpenOffice
> comes from what the patterns of complaints are. These can arise in
> questions to lists dev@ and users@, in filing of Bugzilla reports (or
> commenting on existing ones), and in comments on the Community Forums.
>
> We can use those to determine more narrowly on what users on what
> platforms are reporting and what they are reporting about. This
> provides evidence of what is found to be important enough to make the
> effort to report. That is important all by itself. It is a clue to
> what others may be experiencing and do not choose or known to report.
>
> A subset of these reports may hinge on particular document formats and
> interchange/interoperability experiences with document formats. My
> unqualified impression is that interchange via Microsoft Office formats
> will dominate, just as Microsoft Windows users are predominant among the
> population of AOO adopters. It will be interesting to identify the ODF-
> related matters that also come up and what the balance is.
>
> It is not easy to analyze this source mechanically but it is possible to
> do some manual "analytics" of various kinds.
>
> Is this worth doing?
>
> Of what value would digging this information out at an initial level of
> detail be?
>
> We could probably look at a couple of month's data for clues and then
> examine a longer period if it seems profitable.
>
>
> > -----Original Message-----
> > From: Dennis E. Hamilton [mailto:[email protected]]
> > Sent: Sunday, November 8, 2015 22:19
> > To: [email protected]
> > Subject: [REPORT] Apache OpenOffice ODF in the Marketplace - AOO 4.1.1
> > downloads
> >
> > Here are updates of the downloads for Apache OpenOffice 4.1.1, now
> that
> > 4.1.2 is being distributed by the mirror system.
> >
> > From Sourceforge,
> >
> <http://sourceforge.net/projects/openofficeorg.mirror/files/4.1.1/stats/
> > os?dates=2014-08-01+to+2015-11-08>
> >
> > Just shy of 50,000,000 downloads. This number will be exceeded as
> older
> > versions will still continue downloading, although at an ever-
> decreasing
> > rate.
> >
> > 87.7% for Windows
> > 9.0% for Macintosh (0.1% small drop from end of August)
> > 3.3% for everything else, including Linux
> >
> > For the different countries in the same period (53.6 million for all
> > distributions, not just 4.1.1), the breakdown can be found here:
> >
> <http://sourceforge.net/projects/openofficeorg.mirror/files/stats/map?da
> > tes=2014-08-01+to+2015-11-09>.
> >
> > It is cool that there were 3 to Antartica: 2 for Windows, 1 for
> > Macintosh.
> >
> > - Dennis
> >
> >
> >
> > > -----Original Message-----
> > > From: Dennis E. Hamilton [mailto:[email protected]]
> > > Sent: Wednesday, September 23, 2015 18:38
> > > To: [email protected]
> > > Subject: RE: [DISCUSS] Apache OpenOffice ODF in the Marketplace -
> > > Downloading
> > >
> [ ... ]
> > > What is more difficult to determine is what folks are actually doing
> > > with Apache OpenOffice. There may be ways to learn more.
> > >
> > > - Dennis
> > >
> > >
> > >
> > > -----Original Message-----
> > > From: Dennis E. Hamilton [mailto:[email protected]]
> > > Sent: Friday, September 4, 2015 20:01
> > > To: [email protected]
> > > Subject: [DISCUSS] Apache OpenOffice ODF in the Marketplace
> > >
> > > I had not encountered the topic of "ODF in the market place" with
> > regard
> > > to status of Apache OpenOffice. Perhaps I have not been paying
> > > attention.
> > >
> > > I am curious how we might characterize how support for ODF matters
> to
> > > Apache OpenOffice users and various institutions that value support
> > for
> > > ODF in their reliance on Apache OpenOffice and related software.
> > >
> > > How can we determine what the influence of ODF is with respect to
> > Apache
> > > OpenOffice?
> > >
> > > It strikes me there are two parts to this question.
> > >
> > > 1. Who are the users of Apache OpenOffice?
> > >
> > > 2. What are the ways ODF is (comparatively) significant to those
> > users?
> > >
> > > [ ... ]
> > >
> > > WHO ARE THE USERS?
> > >
> > > Although there are now over 150 million downloads of Apache
> > OpenOffice,
> > > that does not tell us how many individual users are involved.
> > >
> > > Perhaps the download counts just for AOO 4.1.1 would be a
> > representable
> > > sample of a particularly-active segment of the user base, even
> though
> > > that would be underestimated a couple of ways. But that, and the
> > > average weekly rate would be useful as "at least" figures.
> > >
> > > The mix of platforms for those downloads is also important,
> reflecting
> > > the context in which those installed downloads are used by new users
> > and
> > > those who are keeping their configurations current.
> > >
> > >
> > > [ ... ]
> > >
> > >
> > > --------------------------------------------------------------------
> -
> > > To unsubscribe, e-mail: [email protected]
> > > For additional commands, e-mail: [email protected]
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]