(expanding on what I think Dan is referring to re: goals), addressing this 
issue would allow EEVS to access data needed to generate breakdowns for metrics 
by method/target site (mobile, desktop, apps).

On Aug 13, 2014, at 1:40 PM, Dan Andreescu <[email protected]> wrote:

> Kevin, for what it's worth I don't think that bug that Sean is asking for is 
> that challenging.  The relevant part we'd have to change is really just a few 
> lines [1].  I respect your decision of course, but I just wanted to point out 
> that this issue does drive towards some of our goals, as we talked a bit 
> about getting EventLogging data to be usable by Wikimetrics, and this is the 
> first step.
> 
> 
> [1] - 
> https://git.wikimedia.org/blob/mediawiki%2Fextensions%2FEventLogging/4d917e1594e6f09784ab0e0bffccc144f87a11b3/server%2Feventlogging%2Fjrm.py#L167
> 
> 
> On Wed, Aug 13, 2014 at 4:19 PM, Aaron Halfaker <[email protected]> 
> wrote:
> OK.  Sounds reasonable.  Sorry to seem as though I am pushing on you & the 
> devs.  In fact, specifying that you won't have the bandwidth to even consider 
> the bug until next quarter gives me the power to push on others.  >:)
> 
> Thanks!
> -Aaron
> 
> 
> On Wed, Aug 13, 2014 at 8:56 PM, Kevin Leduc <[email protected]> wrote:
> Hi Aaron,
> 
> I was not planning on prioritizing any EventLogging work for the rest of this 
> quarter.  The analytics dev team has a goal to get an EEVS dashboard running 
> and I want to keep them focused otherwise we will not reach this goal.
> 
> I'm tempted to ask what springle and YuviPanda can accomplish without the 
> help of the analytics devs, but even that will imply discussions and 
> distractions from our goals.
> 
> In September I am planning on looking at what goals we can set for the next 
> quarter and look at what we want to accomplish with EventLogging.  I was 
> going to prioritize it at that point.
> 
> 
> 
> 
> On Wed, Aug 13, 2014 at 10:28 AM, Aaron Halfaker <[email protected]> 
> wrote:
> Excellent.  Kevin, can you work to get that bug[1] prioritized and let us 
> know?   I can start working with R&D on a proposal to bring to legal.  
> 
> 1. https://bugzilla.wikimedia.org/show_bug.cgi?id=67450
> 
> It stands to reason that you would be interested on the capsule too as it 
> holds the timestamp and wiki project the event applies to, but I imagine we 
> can make fields public selectively.
> 
> Fair enough.  I think we can drop that one column from the capsule and be 
> quite happy with the rest.  No need to purge EventLogging.   
> 
> -Aaron
> 
> 
> On Wed, Aug 13, 2014 at 6:08 PM, Nuria Ruiz <[email protected]> wrote:
> > Re. (2), I didn't say anything about that being related to public/private.  
> > This is a request from springle -- that if we are going to start pushing 
> > Events to LabsDB, he'd like us to do so more efficiently.  That bug is 
> > about efficiently batching inserts.
> ah, my mistake. Kevin can do prioritization as needed.
> 
> >If you are concerned about UserAgents as the sanitization page you linked to 
> >suggests, then we should talk about the >EventLogging capsule, not the 
> >event.  
> If you want to be so precise, sure, that is correct. Note that currently 
> there is no distinction in storage as to the event and the capsule, they are 
> stored together in the same record. Capsule data is only identified by a 
> prefix on the column name. It stands to reason that you would be interested 
> on the capsule too as it holds the timestamp and wiki project the event 
> applies to, but I imagine we can make fields public selectively.
> 
> 
> 
> 
> 
> On Wed, Aug 13, 2014 at 6:47 PM, Aaron Halfaker <[email protected]> 
> wrote:
> Re. (2), I didn't say anything about that being related to public/private.  
> This is a request from springle -- that if we are going to start pushing 
> Events to LabsDB, he'd like us to do so more efficiently.  That bug is about 
> efficiently batching inserts. 
> 
> I don't know what you are talking about re. 90 day purges.  I'm talking about 
> 100% public Event logging events -- E.g. 
> https://meta.wikimedia.org/wiki/Schema:PageMove   Also, we do *not* need to 
> purge EventLogging event data at 90 days.  We need to purge PII at 90 days.  
> We generally do not store PII in EventLogging events, but when we do, we 
> organize 90 days purges as we have recently for the anonymous editor 
> experiments.  If you are concerned about UserAgents as the sanitization page 
> you linked to suggests, then we should talk about the EventLogging capsule, 
> not the event. 
> 
> Re. (1), we are already performing this review internally in order to 
> determine what does and does not conform to the Data Retention Guidelines.  
> It seems clear that a robust process could also identify non-sensitive 
> Schemas that could be published in labs.
> 
> -Aaron
> 
> 
> On Wed, Aug 13, 2014 at 5:00 PM, Nuria Ruiz <[email protected]> wrote:
> Aaron, 
> 
> >(2) https://bugzilla.wikimedia.org/show_bug.cgi?id=67450
> The bug does not have to do with making data public. It has to do with how 
> data is inserted in to EL from the 
> consumers, so it deals with the 'system', not the 'data'. The raw data as 
> inserted cannot be replicated directly to be made public so whether inserts 
> are more efficient does not affect the public/private discussion.
> 
> 
> >(1) there needs to be a good review process in place to make sure that the 
> >data we surface isn't sensitive
> There is a bunch of work involved on this item. For example: per our privacy 
> policy some of this data should be discarded after 90 days and currently it 
> is not. Also, you are aware of the discussions under sanitization: 
> https://www.mediawiki.org/wiki/EventLogging/UserAgentSanitization
> 
> Basically to make EL data public it needs to be aggregated with a level of 
> anonymization we think is acceptable. There is quite a bit of work on this 
> regard, here are some bugs that were filed a while back:
> 
> https://bugzilla.wikimedia.org/show_bug.cgi?id=62978
> 
> https://bugzilla.wikimedia.org/show_bug.cgi?id=59832
> 
> 
> 
> 
> 
> 
> 
> On Wed, Aug 13, 2014 at 3:39 PM, Aaron Halfaker <[email protected]> 
> wrote:
> Hey folks,
> 
> We've been discussing ways to make more Wikimedia data public.  One of our 
> sources for data is EventLogging (EL)[1], a system that lets us track events 
> on both the client and server-side.  Recently, YuviPanda and springle have 
> been working with us to figure out what issues need to be resolved in order 
> to begin loading EL events that contain public data[2] into LabsDB for public 
> consumption and for use in WikiMetrics.
> 
> It looks like there are three major concerns about directing EL to LabsDB.  
> (1) there needs to be a good review process in place to make sure that the 
> data we surface isn't sensitive, (2) 
> https://bugzilla.wikimedia.org/show_bug.cgi?id=67450 will need to be 
> addressed to make sure that we don't over-utilize labs infrastructure and (3) 
> we'll need signoff from legal. 
> 
> It looks like (2) can be taken care of independently from (1) and (3).  Is 
> this bug already prioritized, and if not, could it be?
> 
> 1. https://www.mediawiki.org/wiki/Extension:EventLogging
> 2. Eventually, we'll want a means to sanitize and surface events that contain 
> sensitive information, but I'd argue that is a second step that we should 
> address later since it will likely require more substantial technical work.
> 
> -Aaron
> 
> 
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
> 
> 
> 
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
> 
> 
> 
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
> 
> 
> 
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
> 
> 
> 
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
> 
> 
> 
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
> 
> 
> 
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics
> 
> 
> _______________________________________________
> Analytics mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/analytics

_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Reply via email to