jman <[email protected]> writes:

> Ihor Radchenko <[email protected]> writes:
>
>>> - how they affect labor: people tagging datasets for training
>>
>> Could you elaborate?
>> I did not see this particular argument so far.
> https://stephvee.ca/blog/artificial%20intelligence/generative-ai-is-built-on-the-exploitation-of-the-global-south/
>
> https://www.youtube.com/watch?v=QH654YPxvEE

Oh, this. I did not know, but I am not surprised. I also have to say
that the article conveniently puts the whole focus on AI companies, only
touching a bit on the other aspects this industry.

"data labeling" is a pre-LLM thing.

- how do you think automatic porn/abusive content detection works on
  youtube and other video hostings? That's also done by neural network
  classifiers trained on human-labeled data.
- same thing with content moderation (required by law in social
  networks). Before LLMs, the huge volume of moderation, for example, on
  Twitter, was done by humans. Where do you think they were hired? And
  how much they were paid? Similar for other platforms like
  facebook. LLMs are actually improving a situation in this area.
- More generally, genuine customer service operators are often hiring in
  poor countries. That's also not exactly a high-pay jobs. And customer
  service is not exactly the most emotionally rewarding job.
- and of course, the shadow industry with porn chats that the article
  directly mentions

It can also be much worse. E.g. see
https://www.youtube.com/watch?v=_4f52yoExJE
Some people even do the above, or similar, in forced labor.
See https://www.youtube.com/watch?v=1QhMqoTNSl8

My initial thinking about this is the following:

- I certainly do not see the labeling industry as something that should
   strive
- I am hopeful though - EU laws that enforce disclosure of the training
  data help somewhat. I recall that incomplete initial reporting by
   OpenAI listed "training data specially generated by humans for
   training purposes", which smells similar to what you described. But
   they will have to disclose more details
- Even with disclosure, the industry you describe is not illegal. So, it
  is even more important to promote libre LLMs
- Unlike the other similar industries, LLMs can actually be less of an
  evil. Yes, a lot of unfair practices are used to train existing
  LLMs. But we are getting more and more to the point when data
  labeling is done by LLMs themselves (that's a standard practice for
  LLM fine-tuning nowadays). This greatly reduces the burden on human
  labelers over time, as LLMs get better in labeling
- LLMs also reduce the need in human verifiers in some areas, like
  moderation. There are many specialized moderator LLMs already. That
  means that it does not need to be done by humans.

>> And one more consideration - GNU software guidelines suggest staying
>> away from politics.
>
> I just ... cannot comment on this.

It is easy to understand. Just recall what happened with RMS and how
things not directly related to software affected FSF reputation.
Commenting on hot topics in politics, especially as popular
organization, can damage the main goals of FSF and GNU.

For Org mode, taking specific position on LLM can also be risky in a
sense that the topic is rather polarized. If we take an extreme
position, part of the users and contributors may be alienated. That's
not very helpful.

Another question is that LLMs specifically stare right in our face
now. So, we are more or less forced to say something. At least, on
technical level.

> I hope that whatever decision you will take, you are open to listen to and 
> take into account the 
> community and that others will chime in with opinions.

We are having this very conversation because I am looking for community
inputs.

-- 
Ihor Radchenko // yantar92,
Org mode maintainer,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>

Reply via email to