[nexa] Your Gmail and Instagram are training AI. There’s little you can do about it.

Alberto Cammozzo via nexa Tue, 12 Sep 2023 00:00:32 -0700

<https://www.washingtonpost.com/technology/2023/09/08/gmail-instagram-facebook-trains-ai/>


It’s your Gmail. It’s also Google’s artificial intelligence factory.

Unless you turn it off, Google uses your Gmail to train an AI to finish other 
people’s sentences. It does that by analyzing how you respond to its 
suggestions. And when you opt in to using a new Gmail function called Help Me 
Write, Google uses what you type into it to improve its AI writing, too. You 
can’t say no.


Your email is just the start. Meta, the owner of Facebook, took a billion 
Instagram posts from public accounts to train an AI, and didn’t ask permission. 
Microsoft uses your chats with Bing to coach the AI bot to better answer 
questions, and you can’t stop it.

Increasingly, tech companies are taking your conversations, photos and 
documents to teach their AI how to write, paint and pretend to be human. You 
might be accustomed to them selling your data or using it to target you with 
ads. But now they’re using it to create lucrative new technologies that could 
upend the economy — and make Big Tech even bigger.

We don’t yet understand the risk that this behavior poses to your privacy, 
reputation or work. But there’s not much you can do about it.

Sometimes the companies handle your data with care. Other times, their behavior 
is out of sync with common expectations for what happens with your information, 
including stuff you thought was supposed to be private.
Skip to end of carousel
How your data trains Big Tech’s AI
Meta says it can use the contents of photos and videos shared to “public” on 
its social networks to train its AI products. You can make your Instagram 
account private or change the audience for your Facebook posts.
Gmail, by default in the U.S., uses how you respond to its Smart Compose 
suggestions to train the AI to better finish people’s sentences. You can opt 
out.
Microsoft uses your conversations with its Bing chatbot to “fine-tune” the AI 
and share with its partner OpenAI. There is no way to opt-out as a consumer.
Google learns from your conversations with its Bard chatbot, including having 
some reviewed by humans. You can ask Google to delete your chat history, but it 
will still hold on to chats for up to 72 hours.
Google uses what you type and other “interactions” with its Workspace Labs AI 
in Gmail, Docs, Slides and Sheets to help its AI become a better creative 
coach. You cannot opt out if you want to use these functions.
Google uses your private text or voice conversations with its Assistant to 
“fine-tune” the responses of Assistant or Bard. You can opt out by adjusting 
your Google privacy settings to not save your activity.
Google says it can use “publicly available information” to train its AI, 
including the contents of YouTube videos and Google Docs that have been 
published to the Web.
End of carousel

Zoom set off alarms last month by claiming it could use the private contents of 
video chats to improve its AI products, before reversing course. Earlier this 
summer, Google updated its privacy policy to say it can use any “publicly 
available information” to train AI. (Google didn’t say why it thinks it has 
that right. But it says that’s not a new policy and it just wanted to be clear 
it applies to its Bard chatbot.)

If you’re using pretty much any of Big Tech’s buzzy new generative AI products, 
you’ve likely been compelled to agree to help make their AI smarter, sometimes 
including having humans review what you do with them.

Lost in the data grab: Most people have no way to make truly informed decisions 
about how their data is being used to train AI. That can feel like a privacy 
violation — or just like theft.

“AI represents a once-in-a-generation leap forward,” says Nicholas Piachaud, a 
director at the open source nonprofit Mozilla Foundation. “This is an 
appropriate moment to step back and think: What’s at stake here? Are we willing 
just to give away our right to privacy, our personal data to these big 
companies? Or should privacy be the default?”
New privacy risks

It isn’t new for tech companies to use your data to train AI products. Netflix 
uses what you watch and rate to generate recommendations. Meta uses what you 
like, comment on and even spend time looking to train its AI how to order your 
news feed and show you ads.

Yet generative AI is different. Today’s AI arms race needs lots and lots of 
data. Elon Musk, chief executive of Tesla, recently bragged to his biographer 
that he had access to 160 billion video frames per day shot from the cameras 
built into people’s cars to fuel his AI ambitions.

“Everybody is sort of acting as if there is this manifest destiny of 
technological tools built with people’s data,” says Ben Winters, a senior 
counsel at the Electronic Privacy Information Center (EPIC), who has been 
studying the harms of generative AI. “With the increasing use of AI tools comes 
this skewed incentive to collect as much data as you can upfront.”

All of this brings some unique privacy risks. Training an AI to learn 
everything about the world means it also ends up learning intimate things about 
individuals, from financial and medical details to people’s photos and writing.

Some tech companies even acknowledge that in their fine print. When you sign up 
to use Google’s new Workspace Labs AI writing and image-generation helpers for 
Gmail, Docs, Sheets and Slides, the company warns: “don’t include personal, 
confidential, or sensitive information.”

The actual process of training AI can be a bit creepy. Companies employ humans 
to review some of how we use products such as Google’s new AI-fueled search 
called SGE. In its fine print for Workspace Labs, Google warns it may hold your 
data seen by human reviewers for up to four years in a manner not directly 
associated with your account.

Even worse for your privacy, AI sometimes leaks data back out. Generative AI 
that’s notoriously hard to control can regurgitate personal info in response to 
a new, sometimes unforeseen prompt.

It even happened to a tech company. Samsung employees were reportedly using 
ChatGPT and discovered on three different occasions that the chatbot spit back 
out company secrets. The company then banned the use of AI chatbots at work. 
Apple, Spotify, Verizon and many banks have done the same.

The Big Tech companies told me they take pains to prevent leaks. Microsoft says 
it de-identifies user data entered in Bing chat. Google says it automatically 
removes personally identifiable information from training data. Meta says it 
will train generative AI not to reveal private information — so it might share 
the birthday of a celebrity, but not regular people.

Okay, but how effective are these measures? That’s among the questions the 
companies don’t give straight answers to. “While our filters are at the cutting 
edge in the industry, we’re continuing to improve them,” says Google. And how 
often do they leak? “We believe it’s very limited,” it says.

It’s great to know Google’s AI only sometimes leaks our information. “It’s 
really difficult for them to say, with a straight face, ‘we don’t have any 
sensitive data,’” says Winters of EPIC.

Perhaps privacy isn’t even the right word for this mess. It’s also about 
control. Who’d ever have imagined a vacation photo they posted in 2009 would be 
used by a megacorporation in 2023 to teach an AI to make art, put a 
photographer out of a job, or identify someone’s face to police? When they take 
your information to train AI, companies can ignore your original intent in 
creating or sharing it in the first place.

There’s a thin line between “making products better” and theft, and tech 
companies think they get to draw it.
Your data, their rules

Which data of ours is and isn’t off limits? Much of the answer is wrapped up in 
lawsuits, investigations and hopefully some new laws. But meanwhile, Big Tech 
is making up its own rules.

I asked Google, Meta and Microsoft to tell me exactly when they take user data 
from products that are core to modern life to make their new generative AI 
products smarter. Getting straight answers was like chasing a squirrel through 
a funhouse.

They told me they hadn’t used nonpublic user information in their largest AI 
models without permission. But those very carefully chosen words leave a lot of 
occasions when they are, in fact, building lucrative AI businesses with our 
digital lives.

Not all AI uses for data are the same, or even problematic. But as users, we 
practically need a degree in computer science to understand what’s going on.

Google is a great example. It tells me its “foundational” AI models — the 
software behind things like Bard, its answer-anything chatbot — come primarily 
from “publicly available data from the internet.” Our private Gmail didn’t 
contribute to that, the company says.

However, Google does still use Gmail to train other AI products, like Smart 
Compose (which finishes sentences for you) and the new creative coach Help Me 
Write that’s part of its Workspace Labs. Those uses are fundamentally different 
from “foundational” AI, Google says, because it’s using data from a product to 
improve that product. The Smart Compose AI, it says, anonymizes and aggregates 
our information and improves the AI “without exposing the actual content in 
question.” It says the Help Me Write AI learns from your “interactions, 
user-initiated feedback, and usage metrics.” How are you supposed to know 
what’s actually going on?

Perhaps there’s no way to create something like Smart Compose without data 
about how you use your email. But that doesn’t mean Google should just switch 
it on by default. In Europe, where there are stricter data laws, Smart Compose 
is off by default. Nor should access to your data be a requirement to use its 
latest and greatest products, even if Google calls them “experiments.”

Meta told me it didn’t train its biggest generative AI model, called Llama 2, 
on user data — public or private. However, it has trained other AI, like an 
image-identification system called SEER, on people’s public Instagram accounts. 
To avoid that, you’d have to have set your account to private, or quit 
Instagram.

And Meta wouldn’t answer my questions about how it’s using our personal data to 
train generative AI products it is expected to unveil soon. After I pushed 
back, the company said it would “not train our generative AI models on people’s 
messages with their friends and families.” At least it agreed to draw some kind 
of red line.

Microsoft updated its service agreement this summer with broad language about 
user data, and it didn’t make any assurances to me about limiting the use of 
our data to train its AI products. Microsoft tells me it does not use our data 
from Word or other Microsoft 365 programs to “train underlying foundational 
models,” but that’s not the question I was asking.

The consumer advocates at Mozilla also launched a campaign calling on Microsoft 
to come clean. “If nine experts in privacy can’t understand what Microsoft does 
with your data, what chance does the average person have?” Mozilla says.

It doesn’t have to be this way. Microsoft has lots of assurances for lucrative 
corporate customers, including those chatting with the enterprise version of 
Bing, about keeping their data private. “Data always remains within the 
customer’s tenant and is never used for other purposes,” says a spokesman.

Why do companies have more of a right to privacy than all of us?

_______________________________________________
nexa mailing list
[email protected]
https://server-nexa.polito.it/cgi-bin/mailman/listinfo/nexa

[nexa] Your Gmail and Instagram are training AI. There’s little you can do about it.

Reply via email to