At 6:16 PM -0400 20/11/2000, Nick Arnett wrote:
>Don't know. I know of some work done 10-12 years ago, using Verity's
>Topic, that categorized people's writing as liberal/conservative. It was
>something like 70 percent accurate, IIRC.
That IS very interesting. I'd love to see if any kind of report was
published about it. That suggests a lot to me about -- forgive the
academicspeak but it's the easiest way to say what I mean -- the ways in
which we internalize discourses. Sorta like what I suggested to you about
the specific terminology of news reports probably being correlated to the
specific terminology used in popular discussions of news, such as in
newsgroups or mailing lists -- something about which I am curious. :) If
that's true to any degree, then I am back in Kristeva-land and will have to
think myself out of one hell of a box.
>But back to your point. We don't care if you are a voter or not; we care
>about how likely you are to influence others with your opinion. Come to
>think of it, there's something very, very interesting there. Could we rank
>other nations according to how influential their citizens are on U.S.
>elections. Hmm....
Ha. That's scary. I can see illicit packets of Quebec maple syrup crossing
the borders, arriving in Florida. *grin* But seriously,
>However, if you were reviewing a movie, I can tell you that we could
>determine, with about 80 percent accuracy, whether you liked the movie or
>not. That's a lot better than we had expected, to be honest. Working on
>this, we have perhaps the top person in the world in text pattern
>recognition, so if anybody could do it...
Right. And if you used some kind of discursive analysis, maybe that could
help. I mean to say that in various camps, the terminology used to
construct the discourse and to stabilize it tends to be, to some degree,
standardized. For example, you can almost bet that anyone who uses the word
"discursive" is a left-wing humanities or social sciences academic
(according to the categories that I imagine are commonly understood, which
are not necessarily categories in terms of which I like to think). I
suspect that there are certain vocabularies that I think are specialized
not only in terms of profession, but also in terms of other affiliations,
including ideological (political, religious, whatever) affiliations. After
all, such positions are only really expressible publically in terms of
words, and since most people *come* to those positions, they learn how to
*be* a leftist or a conservative or a communist or whatever. I *know* that
this in part entails learning not only a set of ideas but the language that
is used to express those ideas in shorthand and longhand. Anyway, it's an
intriguing possibility, to my mind. But may be too difficult to implement,
I suppose, or may be completely off-base.
>Two important points -- *people* disagree on the ground truth, so 70-80
>percent is better than you might imagine. That is, give two (or more)
>people a set of movie review messages or political comments and they'll
>disagree 10 percent or more of the time.
Tell me about it! Try sitting in on one of my litcrit classes. :) People
seem able to disagree on the most fundamental points of interpretation.
However, there is often a remedy to this -- context. It's quite feasible
to advance an argument that Chaucer is a proto-feminist who is using the
Wife of Bath to trumpet women's equality and power and so on. Until you
look at all the context around, including the fact that she is suspiciously
similar to the old hag in several other Medieval works such as _The Romance
of the Rose_, and that she is misquoting scripture and "twisting" it to her
own intepretation in a culture where glossation is critical, and so on.
More context makes it harder and harder to go terribly wrong, and computers
have the advantage of not starting out with an overt agenda such as
"finding feminist precursors in the Middle Ages". I suppose for a computer
it would entail a huge sample of the person's writing, which probably is
harder to get for most average Webcitizens than for some of us bigmouths.
:)
This also make me wonder what the overt agenda for a computer program doing
this *is*. I guess it is to categorize, which can be just as mistake
sometimes. Hmm.
>Second, when the computer is
>wrong, it can be incredibly, laughably wrong. Software has a hard time
>identifying sarcasm that might be obvious to you and me, for example.
Yeah. Probably you'd have to train it read literally and crossreference to
a profile of the person developed from a large sample or something, since
it can't really have a "sense" of the person in any other way and it can't
grasp scale all that well. Maybe it would just have to note such things as
apparent self-contradictions, and as soon as it is contradicted back to the
standard profile, it could be slotted as sarcasm? *grin* That's how I
learned to read Darryl. :p
Just thoughts and reactions.
Gord