Adding analytics@ a public e-mail list where you can post questions such as this one.
>that doesn’t tell us how often entities are accessed through Special:EntityData or wbgetclaims >Does this data already exist, even in the form of raw access logs? Is this data always requested via http from an api endpoint that will hit a varnish cache? (Daniel can probably answer this) >From what I see on our data we have requests like the following: www.wikidata.org /w/api.php ?action=wbgetclaims&format=json&entity=Q633155 www.wikidata.org /w/api.php ?callback=jQuery11130020702992017004984_1465195743367&format=json&action=wbgetclaims&property=P373&entity=Q5296&_=1465195743368 www.wikidata.org /w/api.php ?action=wbgetclaims&format=json&entity=Q573612 www.wikidata.org /w/api.php ?action=wbgetclaims&format=json&entity=Q472729 www.wikidata.org /w/api.php ?action=wbgetclaims&format=json&entity=Q349797 www.wikidata.org /w/api.php ?action=compare&torev=344163911&fromrev=344163907&format=json www.wikidata.org /w/api.php ?action=wbgetentities&format=xml&ids=Q2356135 www.wikidata.org /w/api.php ?action=wbgetentities&format=xml&ids=Q2355988 www.wikidata.org /w/api.php ?action=compare&torev=344164023&fromrev=344163948&format=json If the data you are interested in can be inferred from these requests there is no additional data gathering needed. >If not, what effort would be required to gather this data? For the purposes of my proposal to the U.S. Census Bureau I am estimating around >six weeks of effort for this for one person working full-time. If it will take more time I will need to know. I think I have mentioned this before on an e-mail thread but without knowing the details of what you want to do we cannot give you a time estimate. What are the exact metrics you are interested on? Is the project described anywhere in meta? Thanks, Nuria On Thu, Jun 30, 2016 at 11:45 AM, James Hare <[email protected]> wrote: > Copying Lydia Pintscher and Daniel Kinzler (with whom I’ve discussed this > very topic). > > I am interested in metrics that describe how Wikidata is used. While we do > have views on individual pages, that doesn’t tell us how often entities are > accessed through Special:EntityData or wbgetclaims. Nor does it tell us how > often statements/RDF triples show up in the Wikidata Query Service. Does > this data already exist, even in the form of raw access logs? If not, what > effort would be required to gather this data? For the purposes of my > proposal to the U.S. Census Bureau I am estimating around six weeks of > effort for this for one person working full-time. If it will take more time > I will need to know. > > > Thank you, > James Hare > > On Thursday, June 2, 2016 at 2:18 PM, Nuria Ruiz wrote: > > James: > > >My current operating assumption is that it would take one person, > working on a full time basis, around six weeks to go from raw access logs > >to a functioning API that would provide information on how many times a > Wikidata entity was accessed through the various APIs and the >query > service. Do you believe this to be an accurate level of effort estimation > based on your experience with past projects of this nature? > You are starting from the assumption that we do have the data you are > interested in in the logs which I am not sure it is the case, have you done > you checks on this regard with wikidata developers? > > Analytics 'automagically' collects data from logs about *page* requests, > any other requests collections (and it seems that yours fit on this > scenario) need to be instrumented. I would send an e-mail to analytics@ > public list and wikidata folks to ask about how to harvest the data you are > interested in, it doesn't sound like it is being collected at this time so > your project scope might be quite a bit bigger than you think. > > Thanks, > > Nuria > > > > > On Thu, Jun 2, 2016 at 5:06 AM, James Hare <[email protected]> wrote: > > Hello Nuria, > > I am currently developing a proposal for the U.S. Census Bureau to > integrate their datasets with Wikidata. As part of this, I am interested in > getting Wikidata usage metrics beyond the page view data currently > available. My concern is that the page views API gives you information only > on how many times a *page* is accessed – but Wikidata is not really used > in this way. More often is it the case that Wikidata’s information is > accessed through the API endpoints (wbgetclaims etc.), through > Special:EntityData, and the Wikidata Query Service. If we have information > on usage through those mechanisms, that would give me much better > information on Wikidata’s usage. > > To the extent these metrics are important to my prospective client, I am > willing to provide in-kind support to the analytics team to make this > information available, including expenses associated with the NDA process > (I understand that such a person may need to deal with raw access logs that > include PII.) My current operating assumption is that it would take one > person, working on a full time basis, around six weeks to go from raw > access logs to a functioning API that would provide information on how many > times a Wikidata entity was accessed through the various APIs and the query > service. Do you believe this to be an accurate level of effort estimation > based on your experience with past projects of this nature? > > Please let me know if you have any questions. I am happy to discuss my > idea with you further. > > > Regards, > James Hare > > > >
_______________________________________________ Analytics mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/analytics
