Yep, so that's the best data I know of as well. The table that backs the public API is documented here <https://datahub.wikimedia.org/dataset/urn:li:dataset:(urn:li:dataPlatform:hive,wmf.mediarequest,PROD)/Schema?is_lineage_mode=false&schemaFilter=>. And we have a visualization of this in Wikistats, where you can filter down to just audio files <https://stats.wikimedia.org/#/all-projects/content/total-mediarequests/normal|bar|2-year|media_type~audio|monthly> .
Happy to help slice and dice through the data, you can post questions here or ping me. On Thu, Feb 2, 2023 at 1:25 PM Andrew Otto <o...@wikimedia.org> wrote: > Hi Willy, > (Forwarding your question to the public analytics list for others who > might know more.) > > > Do you have any data that shows how many times audio files were > downloaded in 2022? > > I think your best bet is the Mediacounts dataset > <https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Mediacounts>, > which is available in a public API > <https://wikitech.wikimedia.org/wiki/Analytics/AQS/Mediarequests>. E.g., > to get # requested of audio downloads in 2022: > > https://wikimedia.org/api/rest_v1/metrics/mediarequests/aggregate/all-referers/audio/all-agents/monthly/20220101/20221231 > > However, it doesn't look like data transfer details are available in the > Public API. The backing dataset in Hive does have a total_response_size field > so you could probably get this info more specifically by querying for it in > Hive. > > Good luck! > > On Wed, Feb 1, 2023 at 7:11 PM Willy Pao <w...@wikimedia.org> wrote: > >> Hey Andrew - hope all is going well. I've been working on gathering some >> data for Wikimedia's Annual Sustainability Report, and there was a question >> that Deb sent over regarding the usage of Audio files. With Jaime's help >> from Data Persistence SRE, we were able to figure out some of the numbers >> around storage and energy consumption. There was one part I was hoping you >> (or someone from your team) might be able to help with though. Do you have >> any data that shows how many times audio files were downloaded in 2022? >> Much appreciated in advance. >> >> Thanks, >> Willy >> >> ---------- Forwarded message --------- >> From: Deb Tankersley <dtankers...@wikimedia.org> >> Date: Mon, Jan 30, 2023 at 1:41 PM >> Subject: energy used to store >> To: Willy Pao <w...@wikimedia.org>, Erin Morris <emor...@wikimedia.org>, >> Cassie Casares <ccasa...@wikimedia.org> >> >> >> Hey Willy! >> >> I got an interesting question (bolded below) from Wikimedia Sweden on the >> energy that we use to store and serve audio files. Here's their full >> comment / question: >> >> *"As part of my yearly planning for 2023, we are conducting a study >>> regarding digitization of audio tapes, which climate footprints the various >>> stages in the process generate and whether some of these can be made more >>> energy efficient. We have limited the study to audio tapes, because it is a >>> prioritized material category and a very data-intensive business, and >>> because the limitation hopefully gives us relatively accurate numbers. >>> Since we have been publishing digital audio originally from audio tapes on >>> Wikimedia Commons for the past few years, I was wondering if there are any >>> statistics related to energy consumption and carbon dioxide emissions >>> available?* >>> >>> >>> *What we would like to know is how much energy is required in the year >>> 2022 to store our total amount of uploaded audio files (with the exception >>> of Karl Tirén's phonograph recordings), how many times they have been >>> downloaded and how large a total amount of data is involved. We suspect >>> that downloading the high-resolution audio files is also relatively data >>> intensive. As mentioned, the goal is not to stop this activity, or even >>> reduce it without seeing how it looks and then investigating whether there >>> are any links in the chain that can be tweaked to possibly reduce the >>> climate impact. If numbers cannot be obtained, this is also valuable >>> information."* >>> >> >> >> I'm not sure if we can narrow down this enough to get them a decent / >> solid answer. What are your thoughts? >> >> >> Thanks, >> >> >> Deb >> >> -- >> >> deb tankersley (she/her) >> >> senior program manager, engineering >> >> Wikimedia Foundation >> >> >> >> >> _______________________________________________ > Analytics mailing list -- analytics@lists.wikimedia.org > To unsubscribe send an email to analytics-le...@lists.wikimedia.org >
_______________________________________________ Analytics mailing list -- analytics@lists.wikimedia.org To unsubscribe send an email to analytics-le...@lists.wikimedia.org