Nice assignment!

Just curious if re-tweets in the data were removed. Not sure how much that will 
affect the results, but it does seem to me that tweets from popular tweeters 
are usually re-tweeted by users. If that is a significant portion of tweets in 
the unpopular tweeters then the distributions of those two top-terms lists will 
end up appearing more similar than they should, if the re-tweets are not 
removed.

I am guessing there are much less number of popular tweeters than unpopular 
ones. That will (I think) most likely translate to the top-terms list for 
popular tweeters being topically less diverse than that of the unpopular users. 
How to evaluate this hypothesis doesn't seem very obvious. Perhaps word 
clustering using WordNet senses is one way to start -- but Twitter vocabulary 
will certainly challenge WordNet coverage.

Thanks,

-Mahesh


________________________________
From: Ted Pedersen <duluth...@gmail.com>
To: nlpatumd@yahoogroups.com
Sent: Sunday, April 10, 2011 11:12 AM
Subject: Re: [nlpatumd] a twitter puzzle


  
In my computer architecture class this semester (Spring 2011) we've
been focusing on using Hadoop on a big cluster down at the Minnesota
SuperComputing Institute. We recently did an assignment based on
Twitter data, where I collected about 100 million tweets sent between
March 8 - 31. I posed the assignment more or less in these terms...

There are popular people and unpopular people on Twitter. The popular
ones have more followers than they follow. Unpopular people follow
more people than follow them. Find out which are the most popular
terms as use by popular and unpopular people, and we'll compare those
lists to see what makes a tweeter popular or not...Terms were selected
based on frequency, and had to be 6 characters or more long.

My question to you is simple. Would you expect any differences in the
most frequent terms used by popular and unpopular people? If so, what
might they be?

I'll report back in a few days.

Enjoy,
Ted

PS Here's the formal assignment statement...

https://sites.google.com/site/duluthted/cs-5621-computer-architecture---spring-2011/programming-assignment-4---due-friday-april-8-by-5pm-to-the-webdrop

-- 
Ted Pedersen
http://www.d.umn.edu/~tpederse

 

Reply via email to