Re: Aw: Re: Community renewal and project obsolescence
On 2023-12-30 21:40:03 -0500 (-0500), Mo Zhou wrote: [...] > How can one download the Debian public mailing list dumps? [...] I think you'd have to scrape the HTML (MHonArc) archives. The last update I remember is that the listmasters are intentionally not providing raw archives, though perhaps that 15 year old decision could be revisited if there's new compelling reasons: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=161440#39 Alternatively, I suppose a DD with access to the raw archive data on the server could (perhaps after some discussion with the listmasters) perform LLM training on those, but would probably need to sanitize it and weed out the spam when doing so. -- Jeremy Stanley signature.asc Description: PGP signature
Re: Aw: Re: Community renewal and project obsolescence
On 2023-12-31 05:22, Mo Zhou wrote: I am not able to develop DebGPT and confess I am not investing my time in learning to do it. But can we attract the people who want to tinker in this direction? Debian funds should be able to cover the hardware requirement and training expenses even if they are slightly expensive. The more expensive thing is the time of domain experts. I can train such a model but clearly I do not have bandwidth for that. No. I changed my mind. I can actually quickly wrap some debian-specific prompts with an existing chatting LLM. This is easy and does not need expensive hardware (although it may still require 1~2 GPUs with 24GB memory for inference), nor any training procedure. The project repo is created here https://salsa.debian.org/deeplearning-team/debgpt An alternative to fine tuning would be to use RAG (with LangChain for example).
Re: Aw: Re: Community renewal and project obsolescence
On 12/30/23 21:40, Mo Zhou wrote: I am not able to develop DebGPT and confess I am not investing my time in learning to do it. But can we attract the people who want to tinker in this direction? Debian funds should be able to cover the hardware requirement and training expenses even if they are slightly expensive. The more expensive thing is the time of domain experts. I can train such a model but clearly I do not have bandwidth for that. No. I changed my mind. I can actually quickly wrap some debian-specific prompts with an existing chatting LLM. This is easy and does not need expensive hardware (although it may still require 1~2 GPUs with 24GB memory for inference), nor any training procedure. The project repo is created here https://salsa.debian.org/deeplearning-team/debgpt I have enabled issues. And maybe people interested in this can redirect the detailed discussions to the repo issues. I'm sure it is already possible to let LLM read the long policy document, or debhelper man pages for us, and provide some suggestions or patches. The things I'm uncertain is (1) how well a smaller LLM, like 7B or 13B ones can do compared to proprietary LLMs in this case; (2) how well a smaller LLM can be when it is quantized to int8 or even int4 for laptops. Oh, BTW, the dependencies needed by the project are not complete in debian archive.
Re: Aw: Re: Community renewal and project obsolescence
On 12/30/23 15:06, Charles Plessy wrote: Le Fri, Dec 29, 2023 at 01:14:29PM +0100, Steffen Möller a écrit : What hypothese do we have on what influences the number of active individuals? When I was a kid I was playing with a lot of pirate copy of Amiga and then PC games, and I had a bit of melancholy thinking that what appeared to be golden days took place when I was still busy learning to walk and speak. I wondered if I was born too late. Then I was introduced to Linux and Debian. If you don't mind to share more of your story -- how are you introduced to Linux and Debian? Can we reproduce it? For me this is not reproducible. The beginning of my story is similar to yours. Differently, at that time Windows is the only PC operating system I'm aware of. And I suffered a lot from it and its ecosystem: aggressive reboots, aggressive pop-up windows and ads completely out of my control, enormous difficulty to learn and understand its internals given very limited budget for books, enormous difficulty to learn C programming language based on it. Visual studio did a great job to confuse me with a huge amount of irrelevant details and complicated user interface when I want try the code from the K C book as a newbie (without any educational resource available or affordable). I forgot why I chose this book but it was a correct one to buy. One day, out of curiosity I searched for "free of charge operating systems" so that I can get rid of Windows. Then I got Ubuntu 11.10. Its frequent "internal errors" drove me to try other linux distros in virtualbox, including Debian squeeze and Fedora. While squeeze is the ugliest among them all in terms of desktop environment, it crashes significantly less than the rest. I was happy with my choice. Linux does not reboot unless I decide to do so. It does not pop-up ads because the malwares (while being useful) are not available under linux. It does not prevent me from trying to understand how it works, even if I can hardly grasp the source code. And, `gcc hello-world.c` is ridiculously easy for learning programming compared to using visual studio. I was confused again -- why is all of those free of charge? I tried to learn more until the Debian Social Contract, DFSG and the stuff wrote by FSF (mostly Stallman) completely blown up my mind. With the source code within my reach, I'm able to really tame my computer. The day I realized that is the day when I added "becoming a DD" to my dream list. That was a big thing, a big challenge for me to learn it, and a big reward to be part of it. At that time I never imagined that the next big thing was diversity, inclusion and justice, but being part of Debian unexpectedly connected me to it. Now when I look back I do not worry being born too late. I would like to say to young people that joining a thriving community is the best way to journey beyond one's imagination. Ideally yes, but people's mind is also affected by economy. In developing countries where most people are still struggling to survive and feeding a family, unpaid volunteer work is respected in most of the time, but seldom well-understood. One needs to build up a very strong motivation before taking actions to override the barrier of societal bias. That's partly the one of the reasons why the number of Chinese DDs is so scarce while China has a very large number of population. And in contrast, most DDs are from developed countries. I like the interpretations on how human society works from the book "Sapiens: a brief history of humankind". Basically, what connects people all over the world, forming this community is a commonly believed simple story -- we want to build a free and universal operating system. (I'm sad to see this sentence being removed from debian.org) The common belief is the ground on which we build trust and start collaboration. So, essentially, renewing the community is to spread the simply story, to the young people who seek for something that Debian/FOSS can provide. I don't know how to achieve it. But I do know that my story is completely unreproducible. Of course, we need to show how we are thriving. On my wishlist for 2024, there is of course AI. In case people interested in this topic does not know we have a dedicated ML for that: https://lists.debian.org/debian-ai/ The key word GPT successfully toggled my "write-a-long-response" button. Here we go. Can we have a DebGPT that will allow us to interact with our mailing list archives using natural language? I've ever tried to ask ChatGPT about Debian related questions. While ChatGPT is very good at general linux questions, it turns that its training data does not contain much about Debian-specific knowledge. The quality of training data really matters for LLM's performance, especially the amount of book-quality data. The Debian ML is too noisy compared to wikipedia dump and books. While the training set of the
Re: Aw: Re: Community renewal and project obsolescence
Le Fri, Dec 29, 2023 at 01:14:29PM +0100, Steffen Möller a écrit : > > What hypothese do we have on what influences the number of active individuals? When I was a kid I was playing with a lot of pirate copy of Amiga and then PC games, and I had a bit of melancholy thinking that what appeared to be golden days took place when I was still busy learning to walk and speak. I wondered if I was born too late. Then I was introduced to Linux and Debian. That was a big thing, a big challenge for me to learn it, and a big reward to be part of it. At that time I never imagined that the next big thing was diversity, inclusion and justice, but being part of Debian unexpectedly connected me to it. Now when I look back I do not worry being born too late. I would like to say to young people that joining a thriving community is the best way to journey beyond one's imagination. Of course, we need to show how we are thriving. On my wishlist for 2024, there is of course AI. Can we have a DebGPT that will allow us to interact with our mailing list archives using natural language? Can that DebGPT produce code that we know derives from a training set that only includes works for which peole really consented that their copyrights and licenses will be dissolved? Can it be the single entry point for our whole infrastructure? I wish I could say "DebGPT, please accept all these loongarch64 patches and upload the packages now", or "DebGPT, update debian/copyright now and show me the diff". I am not able to develop DebGPT and confess I am not investing my time in learning to do it. But can we attract the people who want to tinker in this direction? Not because we are the best AI team, but because we are one of the hearts of software freedom, and that freedom is deeply connected to everybodys futures. Well, it is too late for invoking Santa Claus, but this said, best wishes for 2024 ! Charles -- Charles Plessy Nagahama, Yomitan, Okinawa, Japan Debian Med packaging team http://www.debian.org/devel/debian-med Tooting from work, https://fediscience.org/@charles_plessy Tooting from home, https://framapiaf.org/@charles_plessy
Re: Aw: Re: Community renewal and project obsolescence
On 2023-12-29 04:14, Steffen Möller wrote: > What hypothese do we have on what influences the number of active individuals? > > Positive factors > * Location of DebConf (with many or not so many devs affording to attend) > * Popular platforms like the Raspberry Pi working with Debian derivative > * Debian packaging teams on salsa > * self-education > * Impression the DD status makes on outsiders/your next employer > * Pleasant interactions on mailing lists with current or past team members > * Team building with other DDs on projects of interest > > Negative factors > * Advent of homebrew+conda > * Containers > * Increasing workloads as one ages and does not give packages up > * Work-life-balance > * Migrating to upstream > * Delay between what upstream releases and what is available in our distro > * Unpleasant interactions on mailing lists with current or past team members > > Do you have a better list? > I keep thinking about what the last significant change in Debian may have > been - to mind came salsa.debian.org. Do I miss anything? > And I think the change I would like to see the most is a variant of > brew/salsa for Debian, preferably in some mostly automated way, so we have > some way to install the very latest with Debian all the time. > > Best, > Steffen As someone who would like to participate more in the development of Debian, my personal experience is that making contributions is like dropping a message in a bottle into the sea. It feels like a complete crap-shot whether I'll even receive a comment on any code contribution (including debian-devel RFS, salsa MR, or BTS patch). If something as low stakes as looking at a patch to fix something broken can be ignored so easily, the idea of asking someone to sign a PGP key or vouch for me in the NM process seem entirely out of the question. I've assumed this is due to current DDs being overburdened. If there were a single thing that could be done, in my mind it would be to have someone make sure that contributions do not go entirely ignored. Even just telling someone "hey, none of the stuff you're submitting is really good enough for Debian" would be helpful because they could either work on improving, or stop trying to contribute. Antonio OpenPGP_0xB01C53D5DED4A4EE.asc Description: OpenPGP public key OpenPGP_signature.asc Description: OpenPGP digital signature
Aw: Re: Community renewal and project obsolescence
> Gesendet: Donnerstag, 28. Dezember 2023 um 20:02 Uhr > Von: "Mo Zhou" > An: debian-project@lists.debian.org > Betreff: Re: Community renewal and project obsolescence > > On 12/28/23 10:34, Rafael Laboissière wrote: > > > * M. Zhou [2023-12-27 19:00]: > > > > Thanks for the code and the figure. Indeed, the trend is confirmed by > > fitting a linear model count ~ year to the new members list. The > > coefficient is -1.39 member/year, which is significantly different > > from zero (F[1,22] = 11.8, p < 0.01). Even when we take out the data > > from year 2001, that could be interpreted as an outlier, the trend is > > still siginificant, with a drop of 0.98 member/year (F[1,21] = 8.48, p > > < 0.01). > > I thought about to use some models for population statistics, so we can > get the data about DD birth rate and DD retire/leave rate, as well as a > prediction. But since the descendants of DDs are not naturally new DDs, > the typical population models are not likely going to work well. The > birth of DD is more likely mutation, sort of. > > Anyway, we do not need sophisticated math models to draw the conclusion > that Debian is an aging community. And yet, we don't seem to have a good > way to reshape the curve using Debian's funds. -- this is one of the key > problems behind the data. What hypothese do we have on what influences the number of active individuals? Positive factors * Location of DebConf (with many or not so many devs affording to attend) * Popular platforms like the Raspberry Pi working with Debian derivative * Debian packaging teams on salsa * self-education * Impression the DD status makes on outsiders/your next employer * Pleasant interactions on mailing lists with current or past team members * Team building with other DDs on projects of interest Negative factors * Advent of homebrew+conda * Containers * Increasing workloads as one ages and does not give packages up * Work-life-balance * Migrating to upstream * Delay between what upstream releases and what is available in our distro * Unpleasant interactions on mailing lists with current or past team members Do you have a better list? I keep thinking about what the last significant change in Debian may have been - to mind came salsa.debian.org. Do I miss anything? And I think the change I would like to see the most is a variant of brew/salsa for Debian, preferably in some mostly automated way, so we have some way to install the very latest with Debian all the time. Best, Steffen