I wonder if emergent understanding of the limitations of 'generative AI' and in particular LLM-based approaches will undermine one of the very silly premises of the last 15-20 years.

The 'big data' spruikers put the proposition that data quality no longer matters if you have enough data. And a lot of data analysis activity seems to have proceeded on that assumption.

When you're drawing inferences from data, and using those inferences to make decisions, and then implementing the decisions, the key question isn't "Was that data generated by a human, an 'unintelligent' artefact, or an 'intelligent' artefact?".

Data quality requires that many criteria be satisfied, but the common element is the reliability of the association that the data has with any real-world phenomenen that it purports to represent.

Overlaid over that is the cluster of issues that underlie misinformation, such as selective quotation and incompleteness / acontextuality; and bias and discrimination arising from dominance of some value-sets over others.

Inability to pick whether something is 'AI-generated' is an issue, yes. But beneath that are far bigger issues, incl. the laziness, ignorance and recklessness of failing to exercise humans' capacity for critical thinking, and failing to either assure adequate quality of source-data or qualify conclusions reached commensurate with the quality factors.

_________________


On 29/7/23 12:23 am, Stephen Loosley wrote:
OpenAI just admitted it can't identify AI-generated text.

That's bad for the internet and it could be really bad for AI models.


By Alistair Barr Jul 28, 2023 https://www.businessinsider.com/openai-cant-identify-ai-generated-text-bad-for-internet-models-2023-7


Large language models and AI chatbots are beginning to flood the internet with auto-generated text.

It's becoming hard to distinguish AI-generated text from human writing.
OpenAI launched a system to spot AI text, but just shut it down because it didn't work.


Beep beep boop. Did a machine write that, or did I?

As the generative AI race picks up, this will be one of the most important questions the technology industry must answer.

ChatGPT, GPT-4, Google Bard, and other new AI services can create convincing and useful written content. Like all technology, this is being used for good and bad things. It can make writing software code faster and easier, but also churn out factual errors and lies.

So, developing a way to spot what is AI text versus human text is foundational.


OpenAI, the creator of ChatGPT and GPT-4, realized this a while ago. In January, it unveiled a "classifier to distinguish between text written by a human and text written by AIs from a variety of providers."

The company warned that it's impossible to reliably detect all AI-written text.

However, OpenAI said good classifiers are important for tackling several problematic situations. Those include false claims that AI-generated text was written by a human, running automated misinformation campaigns, and using AI tools to cheat on homework.

Less than seven months later, the project was scrapped.

"As of July 20, 2023, the AI classifier is no longer available due to its low rate of accuracy," OpenAI wrote in a recent blog. "We are working to incorporate feedback and are currently researching more effective provenance techniques for text."


The implications

If OpenAI can't spot AI writing, how can anyone else? Others are working on this challenge, including a startup called GPTZero. But OpenAI, with Microsoft's backing, is considered the best at this AI stuff.

Once we can't tell the difference between AI and human text, the world of online information becomes more problematic.

There are already spammy websites churning out automated content using new AI models. Some of them have been generating ad revenue, along with lies such as "Biden dead. Harris acting President, address 9 a.m." according to Bloomberg.

This is a very journalistic way of looking at the world. I get it. Not everyone is obsessed with making sure information is accurate. So here's a more worrying possibility for the AI industry:

If tech companies use AI-produced data inadvertently to train new models, some researchers worry those models will get worse. They will feed on their own automated content and fold in on themselves in what's being called an AI "Model Collapse."

A group of AI researchers from fancy universities including Oxford, Cambridge and Toronto has been studying what happens when text produced by a GPT-style AI model (like GPT-4) forms most of the training dataset for the next models.

"We find that use of model-generated content in training causes irreversible defects in the resulting models," they concluded in a recent research paper.

After seeing what could go wrong, the authors issued a plea and made an interesting prediction.

"It has to be taken seriously if we are to sustain the benefits of training from large-scale data scraped from the web," they wrote.

"Indeed, the value of data collected about genuine human interactions with systems will be increasingly valuable in the presence of content generated by LLMs in data crawled from the Internet."

We can't begin to tackle this existential problem if we can't tell whether a human or a machine wrote something online. I emailed OpenAI to ask about their failed AI text classifier and the implications, including Model Collapse. A spokesperson responded with this statement: "We have nothing to add outside of the update outlined in our blog post."

I wrote back, just to check if the spokesperson was a human. "Hahaha, yes I am very much a human, appreciate you for checking in though!" they replied.

--
_______________________________________________
Link mailing list
[email protected]
https://mailman.anu.edu.au/mailman/listinfo/link


--
Roger Clarke                            mailto:[email protected]
T: +61 2 6288 6916   http://www.xamax.com.au  http://www.rogerclarke.com

Xamax Consultancy Pty Ltd 78 Sidaway St, Chapman ACT 2611 AUSTRALIA
Visiting Professor in the Faculty of Law            University of N.S.W.
Visiting Professor in Computer Science    Australian National University

Attachment: OpenPGP_signature
Description: OpenPGP digital signature

_______________________________________________
Link mailing list
[email protected]
https://mailman.anu.edu.au/mailman/listinfo/link

Reply via email to