On Sat, Jul 08, 2017 at 11:04AM, Nikita Ivanov wrote:
> Cos,
> Based on my experience having it off by default negates the entire
> purpose... We need statistically meaningful data set to make any inferences
> from it. Moreover, if we are going to ask folks to turn it on it will
> significantly skew the resulting data set anyways and show full picture. I
> think "on" by default is the better option if we are to collect usage stats
> to begin with.

yes, sure. But having this "on" by default is likely to expose us to another
shit-storm down the road. An interesting dilemma to have indeed. In my
experience, whenever I install something like a browser or an operating
system, it would ask if I want to make the particular piece of software better
by sending back some anonymized stats. Basically, I am given a way to
explicitly opt-out if I wish. 

By turning the feature "on" by default is like saying: "we'll be collecting
some stats, but if you don't want to you can go here and there and disable the
collection. Oh, and by the way - you need to go and figure out the exact steps
to disable it."

> Also, I want to re-iterate it again to avoid misunderstanding: there is no
> proposal nor will there be a technical way to attribute collected data back
> to a certain company. That's not what this is all about. We should only be
> interested in aggregated stats (community size, geo information, language
> information, components usage).

Yes, I think it is clear, but never hurts to re-iterate. 

Cos

> Thoughts?
> 
> --
> Nikita Ivanov
> Founder & CTO
> GridGain Systems
> 
> On Fri, Jul 7, 2017 at 8:17 PM, Konstantin Boudnik <c...@apache.org> wrote:
> 
> > Actually, that should be OFF by default. It sounds like this reduce the
> > amount
> > of the data collected, but this would address the concerns of companies
> > like
> > Roman's. I know for sure that a few of my clients would sue my ass out of
> > existence if I gave them the platform collecting their data-centers info.
> >
> > Let's have it, set if off by default and document and easy way to turn it
> > off.
> > Then start making rounds asking our user base to share _some_ of the stats
> > with the community, so we can track the growth of the install base, etc.
> >
> > Cos
> >
> > On Thu, Jul 06, 2017 at 08:20AM, Nikita Ivanov wrote:
> > > The idea so far is to have a single system property in configuration that
> > > turns this off completely. I envision that this will be prominently
> > > featured on Ignite website so that everyone who would like to disable it
> > -
> > > can do it in seconds.
> > >
> > > Thoughts?
> > >
> > > --
> > > Nikita Ivanov
> > > Founder & CTO
> > > GridGain Systems
> > >
> > > On Wed, Jul 5, 2017 at 9:27 PM, Roman Shtykh <rsht...@yahoo.com> wrote:
> > >
> > > > Nikita,
> > > >
> > > > Sending and storing (somewhere the company cannot securely handle) any
> > > > information (OS version, IP addresses, etc.) that can be used to
> > compromise
> > > > the services would be unacceptable.
> > > > Turning it off might be ok (possibly through the cluster settings, not
> > via
> > > > globally-accessible site), but the thing that there's a risk some
> > > > information can leak outside (for any reason, starting from a human
> > > > mistake) is scary.
> > > >
> > > > -- Roman
> > > >
> > > >
> > > >
> > > >
> > > > On Thursday, July 6, 2017 12:38 PM, Nikita Ivanov <
> > niva...@gridgain.com>
> > > > wrote:
> > > >
> > > >
> > > > Roman,
> > > > Thanks for the feedback. What are those questions specifically? Are IP
> > > > addresses and OS is what causing it?
> > > >
> > > > Thanks!
> > > >
> > > > --
> > > > Nikita Ivanov
> > > > Founder & CTO
> > > > GridGain Systems
> > > >
> > > > On Wed, Jul 5, 2017 at 6:15 PM, Roman Shtykh <rsht...@yahoo.com.invalid
> > >
> > > > wrote:
> > > >
> > > > NIkita,
> > > >
> > > > While this will help improve Ignite, it will prevent its adoption by
> > many
> > > > projects -- sending and retaining IP adresses, OS versions, etc. raises
> > > > tons of questions when considering to use Ignite. Even if it can be
> > opted
> > > > out.
> > > > -- Roman
> > > >
> > > >
> > > >     On Thursday, July 6, 2017 5:38 AM, Nikita Ivanov <
> > nivano...@gmail.com>
> > > > wrote:
> > > >
> > > >
> > > >  Igniters,
> > > > I would like to kick off the discussion on the idea of collecting
> > Ignite
> > > > usage statistics. The basic idea behind this is to better understand
> > > > general and anonymous Ignite usage information to better calibrate
> > > > community efforts in developing new features, improving existing ones,
> > > > delivering better documentation - and in every other way to make our
> > > > project a better software solution.
> > > >
> > > > Although such instrumentation is standard practice in commercially
> > > > developed software, for an ASF project this could be a sensitive issue.
> > > > Therefore I would like to initiate a full community discussion on how
> > best
> > > > to implement such practice for the benefit of project while ensuring
> > the
> > > > privacy protection of Ignite users.
> > > >
> > > > To ignite (pun intended) the discussion I'll outline below some of the
> > > > basic thoughts that I have on this subject. They are here only to give
> > an
> > > > idea of what such instrumentation may potentially look like so that we
> > can
> > > > discuss the merits of this idea in a tangible context.
> > > >
> > > > Overview
> > > > -------------
> > > > Upon start and every hour thereafter each Ignite node will collect,
> > encrypt
> > > > and send usage statistics over HTTPS to the ASF-hosted server. That
> > server
> > > > will accept such HTTPS packets, decrypt them and store them in a
> > > > time-series DB. A web interface will be provided to view the usage
> > > > information.
> > > >
> > > > Opt-In or Opt-out
> > > > -------------------------
> > > > Opt-out. Ignite website will offer simple instructions (system
> > property) on
> > > > how to disable this instrumentation.
> > > >
> > > > Code, Infra, Access
> > > > ---------------------------
> > > > Ignite instrumentation will be part of the Ignite code base. The
> > collection
> > > > server will be a separate module in the Ignite code base (released
> > > > separately from Ignite). The collection server will be hosted by ASF
> > Infra.
> > > >
> > > > Usage statistics will be publicly accessible by anyone in the
> > community.
> > > >
> > > > Private, Personal Data
> > > > ------------------------------
> > > > No private or personal data will ever be transferred. No emails,
> > usernames,
> > > > company names, grid names, etc.
> > > >
> > > > Data Retention
> > > > --------------------
> > > > All data will be retained for 1 year and deleted permanently
> > thereafter.
> > > >
> > > > Usage Data
> > > > ----------------
> > > > The following data will be collected in each packet sent to the
> > collection
> > > > server:
> > > > - GRID_SIZE (to correspond our testing environment with the more
> > frequent
> > > > cluster sizes)
> > > > - IP_ADDR (for general geo-tracking as well as to know what
> > documentation
> > > > language should be a priority)
> > > > - SES_ID (to track continues uptime vs. re-starts)
> > > > - USERNAME_TYPE (privilege username vs. standard, to track production
> > vs.
> > > > dev/testing usage; note - this is not an actual username)
> > > > - OS_NAME
> > > > - OS_VER
> > > > - OS_ARCH
> > > > - JAVA_VER
> > > > - JAVA_VENDOR
> > > > - COMP_SQL (whether or not this feature was used)
> > > > - COMP_COMPUTE (whether or not this feature was used)
> > > > - COMP_DATAGRID (whether or not this feature was used)
> > > > - COMP_STREAMING (whether or not this feature was used)
> > > > - COMP_IGFS (whether or not this feature was used)
> > > > - COMP_SERVICE (whether or not this feature was used)
> > > > - COMP_PERSISTENCE (whether or not this feature was used)
> > > >
> > > > Please let's discuss this idea. Everyone's comments and suggestions are
> > > > *extremely* welcome.
> > > >
> > > > Thanks,
> > > > Nikita Ivanov.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> >

Attachment: signature.asc
Description: Digital signature

Reply via email to