Re: Aw: Re: Community renewal and project obsolescence

2023-12-31 Thread Jeremy Stanley
On 2023-12-30 21:40:03 -0500 (-0500), Mo Zhou wrote:
[...]
> How can one download the Debian public mailing list dumps?
[...]

I think you'd have to scrape the HTML (MHonArc) archives. The last
update I remember is that the listmasters are intentionally not
providing raw archives, though perhaps that 15 year old decision
could be revisited if there's new compelling reasons:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=161440#39

Alternatively, I suppose a DD with access to the raw archive data on
the server could (perhaps after some discussion with the
listmasters) perform LLM training on those, but would probably need
to sanitize it and weed out the spam when doing so.
-- 
Jeremy Stanley


signature.asc
Description: PGP signature


Re: Aw: Re: Community renewal and project obsolescence

2023-12-31 Thread Vincent Bernat

On 2023-12-31 05:22, Mo Zhou wrote:

I am not
able to develop DebGPT and confess I am not investing my time in
learning to do it.  But can we attract the people who want to tinker in
this direction?


Debian funds should be able to cover the hardware requirement and 
training expenses even if they are slightly expensive. The more 
expensive thing is the time of domain experts. I can train such a 
model but clearly I do not have bandwidth for that.



No. I changed my mind.

I can actually quickly wrap some debian-specific prompts with an 
existing chatting LLM. This is easy and does not need expensive hardware 
(although it may still require 1~2 GPUs with 24GB memory for inference), 
nor any training procedure.


The project repo is created here 
https://salsa.debian.org/deeplearning-team/debgpt


An alternative to fine tuning would be to use RAG (with LangChain for 
example).




Re: Aw: Re: Community renewal and project obsolescence

2023-12-30 Thread Mo Zhou

On 12/30/23 21:40, Mo Zhou wrote:


I am not
able to develop DebGPT and confess I am not investing my time in
learning to do it.  But can we attract the people who want to tinker in
this direction?


Debian funds should be able to cover the hardware requirement and 
training expenses even if they are slightly expensive. The more 
expensive thing is the time of domain experts. I can train such a 
model but clearly I do not have bandwidth for that.



No. I changed my mind.

I can actually quickly wrap some debian-specific prompts with an 
existing chatting LLM. This is easy and does not need expensive hardware 
(although it may still require 1~2 GPUs with 24GB memory for inference), 
nor any training procedure.


The project repo is created here 
https://salsa.debian.org/deeplearning-team/debgpt


I have enabled issues. And maybe people interested in this can redirect 
the detailed discussions to the repo issues.


I'm sure it is already possible to let LLM read the long policy 
document, or debhelper man pages for us, and provide some suggestions or 
patches. The things I'm uncertain is (1) how well a smaller LLM, like 7B 
or 13B ones can do compared to proprietary LLMs in this case; (2) how 
well a smaller LLM can be when it is quantized to int8 or even int4 for 
laptops.


Oh, BTW, the dependencies needed by the project are not complete in 
debian archive.




Re: Aw: Re: Community renewal and project obsolescence

2023-12-30 Thread Mo Zhou

On 12/30/23 15:06, Charles Plessy wrote:


Le Fri, Dec 29, 2023 at 01:14:29PM +0100, Steffen Möller a écrit :

What hypothese do we have on what influences the number of active individuals?

When I was a kid I was playing with a lot of pirate copy of Amiga and
then PC games, and I had a bit of melancholy thinking that what appeared
to be golden days took place when I was still busy learning to walk and
speak.  I wondered if I was born too late.  Then I was introduced to
Linux and Debian.


If you don't mind to share more of your story -- how are you introduced 
to Linux and Debian? Can we reproduce it?


For me this is not reproducible. The beginning of my story is similar to 
yours. Differently, at that time Windows is the only PC operating system 
I'm aware of. And I suffered a lot from it and its ecosystem: aggressive 
reboots, aggressive pop-up windows and ads completely out of my control, 
enormous difficulty to learn and understand its internals given very 
limited budget for books, enormous difficulty to learn C programming 
language based on it. Visual studio did a great job to confuse me with a 
huge amount of irrelevant details and complicated user interface when I 
want try the code from the K C book as a newbie (without any 
educational resource available or affordable). I forgot why I chose this 
book but it was a correct one to buy.


One day, out of curiosity I searched for "free of charge operating 
systems" so that I can get rid of Windows. Then I got Ubuntu 11.10. Its 
frequent "internal errors" drove me to try other linux distros in 
virtualbox, including Debian squeeze and Fedora. While squeeze is the 
ugliest among them all in terms of desktop environment, it crashes 
significantly less than the rest. I was happy with my choice. Linux does 
not reboot unless I decide to do so. It does not pop-up ads because the 
malwares (while being useful) are not available under linux. It does not 
prevent me from trying to understand how it works, even if I can hardly 
grasp the source code. And, `gcc hello-world.c` is ridiculously easy for 
learning programming compared to using visual studio.


I was confused again -- why is all of those free of charge? I tried to 
learn more until the Debian Social Contract, DFSG and the stuff wrote by 
FSF (mostly Stallman) completely blown up my mind. With the source code 
within my reach, I'm able to really tame my computer. The day I realized 
that is the day when I added "becoming a DD" to my dream list.



That was a big thing, a big challenge for me to learn
it, and a big reward to be part of it.  At that time I never imagined
that the next big thing was diversity, inclusion and justice, but being
part of Debian unexpectedly connected me to it.  Now when I look back I
do not worry being born too late.  I would like to say to young people
that joining a thriving community is the best way to journey beyond
one's imagination.


Ideally yes, but people's mind is also affected by economy.

In developing countries where most people are still struggling to 
survive and feeding a family, unpaid volunteer work is respected in most 
of the time, but seldom well-understood. One needs to build up a very 
strong motivation before taking actions to override the barrier of 
societal bias.


That's partly the one of the reasons why the number of Chinese DDs is so 
scarce while China has a very large number of population. And in 
contrast, most DDs are from developed countries.


I like the interpretations on how human society works from the book 
"Sapiens: a brief history of humankind". Basically, what connects people 
all over the world, forming this community is a commonly believed simple 
story -- we want to build a free and universal operating system. (I'm 
sad to see this sentence being removed from debian.org) The common 
belief is the ground on which we build trust and start collaboration.


So, essentially, renewing the community is to spread the simply story, 
to the young people who seek for something that Debian/FOSS can provide. 
I don't know how to achieve it. But I do know that my story is 
completely unreproducible.



Of course, we need to show how we are thriving.  On my wishlist for
2024, there is of course AI.


In case people interested in this topic does not know we have a 
dedicated ML for that:


https://lists.debian.org/debian-ai/


The key word GPT successfully toggled my "write-a-long-response" button. 
Here we go.



  Can we have a DebGPT that will allow us to
interact with our mailing list archives using natural language?


I've ever tried to ask ChatGPT about Debian related questions. While 
ChatGPT is very good at general linux questions, it turns that its 
training data does not contain much about Debian-specific knowledge. The 
quality of training data really matters for LLM's performance, 
especially the amount of book-quality data. The Debian ML is too noisy 
compared to wikipedia dump and books.


While the training set of the 

Re: Aw: Re: Community renewal and project obsolescence

2023-12-30 Thread Charles Plessy
Le Fri, Dec 29, 2023 at 01:14:29PM +0100, Steffen Möller a écrit :
> 
> What hypothese do we have on what influences the number of active individuals?

When I was a kid I was playing with a lot of pirate copy of Amiga and
then PC games, and I had a bit of melancholy thinking that what appeared
to be golden days took place when I was still busy learning to walk and
speak.  I wondered if I was born too late.  Then I was introduced to
Linux and Debian.  That was a big thing, a big challenge for me to learn
it, and a big reward to be part of it.  At that time I never imagined
that the next big thing was diversity, inclusion and justice, but being
part of Debian unexpectedly connected me to it.  Now when I look back I
do not worry being born too late.  I would like to say to young people
that joining a thriving community is the best way to journey beyond
one's imagination. 

Of course, we need to show how we are thriving.  On my wishlist for
2024, there is of course AI.  Can we have a DebGPT that will allow us to
interact with our mailing list archives using natural language?  Can
that DebGPT produce code that we know derives from a training set that
only includes works for which peole really consented that their
copyrights and licenses will be dissolved?  Can it be the single entry
point for our whole infrastructure?  I wish I could say "DebGPT, please
accept all these loongarch64 patches and upload the packages now", or
"DebGPT, update debian/copyright now and show me the diff".  I am not
able to develop DebGPT and confess I am not investing my time in
learning to do it.  But can we attract the people who want to tinker in
this direction?  Not because we are the best AI team, but because we are
one of the hearts of software freedom, and that freedom is deeply
connected to everybodys futures.

Well, it is too late for invoking Santa Claus, but this said, best
wishes for 2024 !

Charles

-- 
Charles Plessy Nagahama, Yomitan, Okinawa, Japan
Debian Med packaging team http://www.debian.org/devel/debian-med
Tooting from work,   https://fediscience.org/@charles_plessy
Tooting from home, https://framapiaf.org/@charles_plessy



Re: Aw: Re: Community renewal and project obsolescence

2023-12-29 Thread Antonio Russo


On 2023-12-29 04:14, Steffen Möller wrote:
> What hypothese do we have on what influences the number of active individuals?
> 
> Positive factors
> * Location of DebConf (with many or not so many devs affording to attend)
> * Popular platforms like the Raspberry Pi working with Debian derivative
> * Debian packaging teams on salsa
> * self-education
> * Impression the DD status makes on outsiders/your next employer
> * Pleasant interactions on mailing lists with current or past team members
> * Team building with other DDs on projects of interest
> 
> Negative factors
> * Advent of homebrew+conda
> * Containers
> * Increasing workloads as one ages and does not give packages up
> * Work-life-balance
> * Migrating to upstream
> * Delay between what upstream releases and what is available in our distro
> * Unpleasant interactions on mailing lists with current or past team members
> 
> Do you have a better list?
> I keep thinking about what the last significant change in Debian may have 
> been - to mind came salsa.debian.org. Do I miss anything?
> And I think the change I would like to see the most is a variant of 
> brew/salsa for Debian, preferably in some mostly automated way, so we have 
> some way to install the very latest with Debian all the time.
> 
> Best,
> Steffen

As someone who would like to participate more in the development of Debian, my 
personal
experience is that making contributions is like dropping a message in a bottle 
into
the sea.  It feels like a complete crap-shot whether I'll even receive a 
comment on
any code contribution (including debian-devel RFS, salsa MR, or BTS patch).

If something as low stakes as looking at a patch to fix something broken can be 
ignored
so easily, the idea of asking someone to sign a PGP key or vouch for me in the 
NM process
seem entirely out of the question.

I've assumed this is due to current DDs being overburdened.

If there were a single thing that could be done, in my mind it would be to have 
someone
make sure that contributions do not go entirely ignored.  Even just telling 
someone "hey,
none of the stuff you're submitting is really good enough for Debian" would be 
helpful
because they could either work on improving, or stop trying to contribute.

Antonio

OpenPGP_0xB01C53D5DED4A4EE.asc
Description: OpenPGP public key


OpenPGP_signature.asc
Description: OpenPGP digital signature


Aw: Re: Community renewal and project obsolescence

2023-12-29 Thread Steffen Möller



> Gesendet: Donnerstag, 28. Dezember 2023 um 20:02 Uhr
> Von: "Mo Zhou" 
> An: debian-project@lists.debian.org
> Betreff: Re: Community renewal and project obsolescence
>
> On 12/28/23 10:34, Rafael Laboissière wrote:
> 
> > * M. Zhou  [2023-12-27 19:00]:
> >
> > Thanks for the code and the figure. Indeed, the trend is confirmed by 
> > fitting a linear model count ~ year to the new members list. The 
> > coefficient is -1.39 member/year, which is significantly different 
> > from zero (F[1,22] = 11.8, p < 0.01). Even when we take out the data 
> > from year 2001, that could be interpreted as an outlier, the trend is 
> > still siginificant, with a drop of 0.98 member/year (F[1,21] = 8.48, p 
> > < 0.01).
> 
> I thought about to use some models for population statistics, so we can 
> get the data about DD birth rate and DD retire/leave rate, as well as a 
> prediction. But since the descendants of DDs are not naturally new DDs, 
> the typical population models are not likely going to work well. The 
> birth of DD is more likely mutation, sort of.
> 
> Anyway, we do not need sophisticated math models to draw the conclusion 
> that Debian is an aging community. And yet, we don't seem to have a good 
> way to reshape the curve using Debian's funds. -- this is one of the key 
> problems behind the data.

What hypothese do we have on what influences the number of active individuals?

Positive factors
* Location of DebConf (with many or not so many devs affording to attend)
* Popular platforms like the Raspberry Pi working with Debian derivative
* Debian packaging teams on salsa
* self-education
* Impression the DD status makes on outsiders/your next employer
* Pleasant interactions on mailing lists with current or past team members
* Team building with other DDs on projects of interest

Negative factors
* Advent of homebrew+conda
* Containers
* Increasing workloads as one ages and does not give packages up
* Work-life-balance
* Migrating to upstream
* Delay between what upstream releases and what is available in our distro
* Unpleasant interactions on mailing lists with current or past team members

Do you have a better list?
I keep thinking about what the last significant change in Debian may have been 
- to mind came salsa.debian.org. Do I miss anything?
And I think the change I would like to see the most is a variant of brew/salsa 
for Debian, preferably in some mostly automated way, so we have some way to 
install the very latest with Debian all the time.

Best,
Steffen