(expanding on what I think Dan is referring to re: goals), addressing this issue would allow EEVS to access data needed to generate breakdowns for metrics by method/target site (mobile, desktop, apps).
On Aug 13, 2014, at 1:40 PM, Dan Andreescu <[email protected]> wrote: > Kevin, for what it's worth I don't think that bug that Sean is asking for is > that challenging. The relevant part we'd have to change is really just a few > lines [1]. I respect your decision of course, but I just wanted to point out > that this issue does drive towards some of our goals, as we talked a bit > about getting EventLogging data to be usable by Wikimetrics, and this is the > first step. > > > [1] - > https://git.wikimedia.org/blob/mediawiki%2Fextensions%2FEventLogging/4d917e1594e6f09784ab0e0bffccc144f87a11b3/server%2Feventlogging%2Fjrm.py#L167 > > > On Wed, Aug 13, 2014 at 4:19 PM, Aaron Halfaker <[email protected]> > wrote: > OK. Sounds reasonable. Sorry to seem as though I am pushing on you & the > devs. In fact, specifying that you won't have the bandwidth to even consider > the bug until next quarter gives me the power to push on others. >:) > > Thanks! > -Aaron > > > On Wed, Aug 13, 2014 at 8:56 PM, Kevin Leduc <[email protected]> wrote: > Hi Aaron, > > I was not planning on prioritizing any EventLogging work for the rest of this > quarter. The analytics dev team has a goal to get an EEVS dashboard running > and I want to keep them focused otherwise we will not reach this goal. > > I'm tempted to ask what springle and YuviPanda can accomplish without the > help of the analytics devs, but even that will imply discussions and > distractions from our goals. > > In September I am planning on looking at what goals we can set for the next > quarter and look at what we want to accomplish with EventLogging. I was > going to prioritize it at that point. > > > > > On Wed, Aug 13, 2014 at 10:28 AM, Aaron Halfaker <[email protected]> > wrote: > Excellent. Kevin, can you work to get that bug[1] prioritized and let us > know? I can start working with R&D on a proposal to bring to legal. > > 1. https://bugzilla.wikimedia.org/show_bug.cgi?id=67450 > > It stands to reason that you would be interested on the capsule too as it > holds the timestamp and wiki project the event applies to, but I imagine we > can make fields public selectively. > > Fair enough. I think we can drop that one column from the capsule and be > quite happy with the rest. No need to purge EventLogging. > > -Aaron > > > On Wed, Aug 13, 2014 at 6:08 PM, Nuria Ruiz <[email protected]> wrote: > > Re. (2), I didn't say anything about that being related to public/private. > > This is a request from springle -- that if we are going to start pushing > > Events to LabsDB, he'd like us to do so more efficiently. That bug is > > about efficiently batching inserts. > ah, my mistake. Kevin can do prioritization as needed. > > >If you are concerned about UserAgents as the sanitization page you linked to > >suggests, then we should talk about the >EventLogging capsule, not the > >event. > If you want to be so precise, sure, that is correct. Note that currently > there is no distinction in storage as to the event and the capsule, they are > stored together in the same record. Capsule data is only identified by a > prefix on the column name. It stands to reason that you would be interested > on the capsule too as it holds the timestamp and wiki project the event > applies to, but I imagine we can make fields public selectively. > > > > > > On Wed, Aug 13, 2014 at 6:47 PM, Aaron Halfaker <[email protected]> > wrote: > Re. (2), I didn't say anything about that being related to public/private. > This is a request from springle -- that if we are going to start pushing > Events to LabsDB, he'd like us to do so more efficiently. That bug is about > efficiently batching inserts. > > I don't know what you are talking about re. 90 day purges. I'm talking about > 100% public Event logging events -- E.g. > https://meta.wikimedia.org/wiki/Schema:PageMove Also, we do *not* need to > purge EventLogging event data at 90 days. We need to purge PII at 90 days. > We generally do not store PII in EventLogging events, but when we do, we > organize 90 days purges as we have recently for the anonymous editor > experiments. If you are concerned about UserAgents as the sanitization page > you linked to suggests, then we should talk about the EventLogging capsule, > not the event. > > Re. (1), we are already performing this review internally in order to > determine what does and does not conform to the Data Retention Guidelines. > It seems clear that a robust process could also identify non-sensitive > Schemas that could be published in labs. > > -Aaron > > > On Wed, Aug 13, 2014 at 5:00 PM, Nuria Ruiz <[email protected]> wrote: > Aaron, > > >(2) https://bugzilla.wikimedia.org/show_bug.cgi?id=67450 > The bug does not have to do with making data public. It has to do with how > data is inserted in to EL from the > consumers, so it deals with the 'system', not the 'data'. The raw data as > inserted cannot be replicated directly to be made public so whether inserts > are more efficient does not affect the public/private discussion. > > > >(1) there needs to be a good review process in place to make sure that the > >data we surface isn't sensitive > There is a bunch of work involved on this item. For example: per our privacy > policy some of this data should be discarded after 90 days and currently it > is not. Also, you are aware of the discussions under sanitization: > https://www.mediawiki.org/wiki/EventLogging/UserAgentSanitization > > Basically to make EL data public it needs to be aggregated with a level of > anonymization we think is acceptable. There is quite a bit of work on this > regard, here are some bugs that were filed a while back: > > https://bugzilla.wikimedia.org/show_bug.cgi?id=62978 > > https://bugzilla.wikimedia.org/show_bug.cgi?id=59832 > > > > > > > > On Wed, Aug 13, 2014 at 3:39 PM, Aaron Halfaker <[email protected]> > wrote: > Hey folks, > > We've been discussing ways to make more Wikimedia data public. One of our > sources for data is EventLogging (EL)[1], a system that lets us track events > on both the client and server-side. Recently, YuviPanda and springle have > been working with us to figure out what issues need to be resolved in order > to begin loading EL events that contain public data[2] into LabsDB for public > consumption and for use in WikiMetrics. > > It looks like there are three major concerns about directing EL to LabsDB. > (1) there needs to be a good review process in place to make sure that the > data we surface isn't sensitive, (2) > https://bugzilla.wikimedia.org/show_bug.cgi?id=67450 will need to be > addressed to make sure that we don't over-utilize labs infrastructure and (3) > we'll need signoff from legal. > > It looks like (2) can be taken care of independently from (1) and (3). Is > this bug already prioritized, and if not, could it be? > > 1. https://www.mediawiki.org/wiki/Extension:EventLogging > 2. Eventually, we'll want a means to sanitize and surface events that contain > sensitive information, but I'd argue that is a second step that we should > address later since it will likely require more substantial technical work. > > -Aaron > > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > > > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > > > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > > > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > > > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > > > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > > > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics > > > _______________________________________________ > Analytics mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
