jman <[email protected]> writes: > Ihor Radchenko <[email protected]> writes: > >>> - how they affect labor: people tagging datasets for training >> >> Could you elaborate? >> I did not see this particular argument so far. > https://stephvee.ca/blog/artificial%20intelligence/generative-ai-is-built-on-the-exploitation-of-the-global-south/ > > https://www.youtube.com/watch?v=QH654YPxvEE
Oh, this. I did not know, but I am not surprised. I also have to say that the article conveniently puts the whole focus on AI companies, only touching a bit on the other aspects this industry. "data labeling" is a pre-LLM thing. - how do you think automatic porn/abusive content detection works on youtube and other video hostings? That's also done by neural network classifiers trained on human-labeled data. - same thing with content moderation (required by law in social networks). Before LLMs, the huge volume of moderation, for example, on Twitter, was done by humans. Where do you think they were hired? And how much they were paid? Similar for other platforms like facebook. LLMs are actually improving a situation in this area. - More generally, genuine customer service operators are often hiring in poor countries. That's also not exactly a high-pay jobs. And customer service is not exactly the most emotionally rewarding job. - and of course, the shadow industry with porn chats that the article directly mentions It can also be much worse. E.g. see https://www.youtube.com/watch?v=_4f52yoExJE Some people even do the above, or similar, in forced labor. See https://www.youtube.com/watch?v=1QhMqoTNSl8 My initial thinking about this is the following: - I certainly do not see the labeling industry as something that should strive - I am hopeful though - EU laws that enforce disclosure of the training data help somewhat. I recall that incomplete initial reporting by OpenAI listed "training data specially generated by humans for training purposes", which smells similar to what you described. But they will have to disclose more details - Even with disclosure, the industry you describe is not illegal. So, it is even more important to promote libre LLMs - Unlike the other similar industries, LLMs can actually be less of an evil. Yes, a lot of unfair practices are used to train existing LLMs. But we are getting more and more to the point when data labeling is done by LLMs themselves (that's a standard practice for LLM fine-tuning nowadays). This greatly reduces the burden on human labelers over time, as LLMs get better in labeling - LLMs also reduce the need in human verifiers in some areas, like moderation. There are many specialized moderator LLMs already. That means that it does not need to be done by humans. >> And one more consideration - GNU software guidelines suggest staying >> away from politics. > > I just ... cannot comment on this. It is easy to understand. Just recall what happened with RMS and how things not directly related to software affected FSF reputation. Commenting on hot topics in politics, especially as popular organization, can damage the main goals of FSF and GNU. For Org mode, taking specific position on LLM can also be risky in a sense that the topic is rather polarized. If we take an extreme position, part of the users and contributors may be alienated. That's not very helpful. Another question is that LLMs specifically stare right in our face now. So, we are more or less forced to say something. At least, on technical level. > I hope that whatever decision you will take, you are open to listen to and > take into account the > community and that others will chime in with opinions. We are having this very conversation because I am looking for community inputs. -- Ihor Radchenko // yantar92, Org mode maintainer, Learn more about Org mode at <https://orgmode.org/>. Support Org development at <https://liberapay.com/org-mode>, or support my work at <https://liberapay.com/yantar92>
