Yep, so that's the best data I know of as well.  The table that backs the
public API is documented here
<https://datahub.wikimedia.org/dataset/urn:li:dataset:(urn:li:dataPlatform:hive,wmf.mediarequest,PROD)/Schema?is_lineage_mode=false&schemaFilter=>.
And we have a visualization of this in Wikistats, where you can filter down
to just audio files
<https://stats.wikimedia.org/#/all-projects/content/total-mediarequests/normal|bar|2-year|media_type~audio|monthly>
.

Happy to help slice and dice through the data, you can post questions here
or ping me.

On Thu, Feb 2, 2023 at 1:25 PM Andrew Otto <o...@wikimedia.org> wrote:

> Hi Willy,
> (Forwarding your question to the public analytics list for others who
> might know more.)
>
> > Do you have any data that shows how many times audio files were
> downloaded in 2022?
>
> I think your best bet is the Mediacounts dataset
> <https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Mediacounts>,
> which is available in a public API
> <https://wikitech.wikimedia.org/wiki/Analytics/AQS/Mediarequests>.  E.g.,
> to get #  requested of audio downloads in 2022:
>
> https://wikimedia.org/api/rest_v1/metrics/mediarequests/aggregate/all-referers/audio/all-agents/monthly/20220101/20221231
>
> However, it doesn't look like data transfer details are available in the
> Public API.  The backing dataset in Hive does have a total_response_size field
> so you could probably get this info more specifically by querying for it in
> Hive.
>
> Good luck!
>
> On Wed, Feb 1, 2023 at 7:11 PM Willy Pao <w...@wikimedia.org> wrote:
>
>> Hey Andrew - hope all is going well.  I've been working on gathering some
>> data for Wikimedia's Annual Sustainability Report, and there was a question
>> that Deb sent over regarding the usage of Audio files.  With Jaime's help
>> from Data Persistence SRE, we were able to figure out some of the numbers
>> around storage and energy consumption.  There was one part I was hoping you
>> (or someone from your team) might be able to help with though.  Do you have
>> any data that shows how many times audio files were downloaded in 2022?
>> Much appreciated in advance.
>>
>> Thanks,
>> Willy
>>
>> ---------- Forwarded message ---------
>> From: Deb Tankersley <dtankers...@wikimedia.org>
>> Date: Mon, Jan 30, 2023 at 1:41 PM
>> Subject: energy used to store
>> To: Willy Pao <w...@wikimedia.org>, Erin Morris <emor...@wikimedia.org>,
>> Cassie Casares <ccasa...@wikimedia.org>
>>
>>
>> Hey Willy!
>>
>> I got an interesting question (bolded below) from Wikimedia Sweden on the
>> energy that we use to store and serve audio files. Here's their full
>> comment / question:
>>
>> *"As part of my yearly planning for 2023, we are conducting a study
>>> regarding digitization of audio tapes, which climate footprints the various
>>> stages in the process generate and whether some of these can be made more
>>> energy efficient. We have limited the study to audio tapes, because it is a
>>> prioritized material category and a very data-intensive business, and
>>> because the limitation hopefully gives us relatively accurate numbers.
>>> Since we have been publishing digital audio originally from audio tapes on
>>> Wikimedia Commons for the past few years, I was wondering if there are any
>>> statistics related to energy consumption and carbon dioxide emissions
>>> available?*
>>>
>>>
>>> *What we would like to know is how much energy is required in the year
>>> 2022 to store our total amount of uploaded audio files (with the exception
>>> of Karl Tirén's phonograph recordings), how many times they have been
>>> downloaded and how large a total amount of data is involved. We suspect
>>> that downloading the high-resolution audio files is also relatively data
>>> intensive. As mentioned, the goal is not to stop this activity, or even
>>> reduce it without seeing how it looks and then investigating whether there
>>> are any links in the chain that can be tweaked to possibly reduce the
>>> climate impact. If numbers cannot be obtained, this is also valuable
>>> information."*
>>>
>>
>>
>> I'm not sure if we can narrow down this enough to get them a decent /
>> solid answer. What are your thoughts?
>>
>>
>> Thanks,
>>
>>
>> Deb
>>
>> --
>>
>> deb tankersley (she/her)
>>
>> senior program manager, engineering
>>
>> Wikimedia Foundation
>>
>>
>>
>>
>> _______________________________________________
> Analytics mailing list -- analytics@lists.wikimedia.org
> To unsubscribe send an email to analytics-le...@lists.wikimedia.org
>
_______________________________________________
Analytics mailing list -- analytics@lists.wikimedia.org
To unsubscribe send an email to analytics-le...@lists.wikimedia.org

Reply via email to