Re: [Analytics] In-kind support for the Analytics team

Nuria Ruiz Fri, 01 Jul 2016 05:48:36 -0700

>POST requests are more tricky, I suppose.
FYI that we do not have post data neither responses to either get or post
requests, we just store urls and http codes for both get and post. Thus the
body of the post is also not available.


>However, I don't think we are logging the contents of responses at all. I
>suppose that would have to be build into BlazeGraph somehow.
You can instrument code to report responses into the cluster just like the
search team does it, depending how easy is to fit the instrumenting code that
can be little or a lot of work. The mediawiki API is also doing similar
"custom" reporting.


James:
I think before asking for a time estimate we would need more detail in your
end as to what metrics are you interested in measuring. If you could
describe your project in meta that would be best. Just in case you might
not be familiar with meta this is an example of how research projects are
described:
https://meta.wikimedia.org/wiki/Research:HTTPS_Transition_and_Article_Censorship



On Fri, Jul 1, 2016 at 1:33 AM, Daniel Kinzler <[email protected]>
wrote:

> Am 01.07.2016 um 01:42 schrieb Nuria Ruiz:
> > Is this data always requested via http from an api endpoint that will
> hit a
> > varnish cache? (Daniel can probably answer this)
>
> Yes. Special:EntityData is a regular special page, and
> action=wbgetentities is a
> regular MW web API request, as your example shows.
>
> > If the data you are interested in can be inferred from these requests
> there is
> > no additional data gathering needed.
>
> Yay!
>
> > Nor does it tell us how
> >     often statements/RDF triples show up in the Wikidata Query Service.
>
> I'm no expert on the query service, adding Stas for that. As far as I know,
> SPARQL queries go through Varnish directly to BlazeGraph. In any case,
> they are
> not processed by MediaWiki at all. Tracking how often an entity is
> mentioned in
> a GET request to the SPARQL service should be possible based on the varnish
> request logs, with a bit of regex magic. POST requests are more tricky, I
> suppose.
>
> However, I don't think we are logging the contents of responses at all. I
> suppose that would have to be build into BlazeGraph somehow. And even if
> we did
> that, that would only tell use which entities were present in a result, not
> which entities were used to answer a query. E.g. if you list all instances
> of a
> class (including subclasses), the entities representing the classes are
> essential to answering the query, but they are not present in the result
> (and
> only the top-most class is present in the query).
>
>
> --
> Daniel Kinzler
> Senior Software Developer
>
> Wikimedia Deutschland
> Gesellschaft zur Förderung Freien Wissens e.V.
>

_______________________________________________
Analytics mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/analytics

Re: [Analytics] In-kind support for the Analytics team

Reply via email to