\layout Chapter

\begin_inset LatexCommand \label{relevancewiscpolicy}


Using Feedback to Find Better Results and to Improve Spidering Performance
\layout Standard

In the information retrieval literature, relevance is usually thought of
 as a boolean value-- either the result is relevant, or it isn't.
 In the classic studies of relevance feedback, a set of results to a query
 are presented to the user, and the user marks some of these as relevant
 or non-relevant.
 Terms drawn from those documents tagged as relevant are used to refine
 the query.
 This has been shown to improve the precision of retrieval (that is, the
 percentage of documents returned that are relevant).
\layout Standard

Precision and recall are useful concepts when you have a fixed number of
 documents that fit into any category.
 In that case, as is conventional in the literature, precision is defined
 as a the percentage of the retrieved documents that fall into the desired
 category (that is, are relevant to a particular information need, as defined
 by the user or a predefined expert assignment), and recall is defined as
 the percentage of the relevant documents that have been retrieved.
 In the Web context, where the number of documents in any particular category
 tends to grow monotonically over time, achieving high precision (so as
 to have a low signal-to-noise ratio for the user) and finding a large number
 of highly relevant documents, and ranking them well, is more important
 than having high overall recall, since most users won't be able to process
 any but the most highly relevant documents.
 Thus precision usually becomes more important than recall, since there
 is usually a flood of information on any topic, more than any one person
 can process.
\layout Standard

In the interest of discovering what features contribute to high levels of
 precision and high mean relevance of returned documents, I conducted the
 following experiment.
 I used a set of 500 manually-selected documents, all on public policy topics
 related to Wisconsin, 100 each on the following topics: Wisconsin Economy,
 Wisconsin Education, Wisconsin Environment, Wisconsin Government and Politics,
 and Wisconsin Health Care.
\layout Standard

A group of 107 student-subjects were used in the experiment, all students
 of Prof.
 Jo Ann Oravec at the University of Wisconsin at Whitewater.
 Each student was assigned to one of the categories, and was asked to rate
 each page in that category along two dimensions, relevance and quality,
 each scaled between 1 and 10.
 The students were instructed that relevance simply meant to what extent
 the page fit into the category in question, and quality meant the overall
 quality of the information on the page, independent of the relevance of
 the page to the category.
 (So, for instance, if one of the pages in the Wisconsin Education set actually
 contained very high quality information on, say, California wineries, it
 would get a relevance value of 1 and a quality value of 10.) 
\layout Standard

The subjects were each assigned to make 100 relevance and 100 quality judgments
 of the 100 pages in the set to which they were assigned.
 In aggregate, the subjects made 8982 relevance judgments and 9013 quality
 judgments, indicating that they actually completed (on average) about 83
 judgments of each kind in each category.
 Therefore, on average, since there were 500 pages total evaluated, there
 were are about 18 judgments of each page on each of the two dimensions.
 (Actually, since only 492 documents were responding to web queries during
 the period of the experiment, although all of them were responding when
 I constructed the list, the number of judgments per document was slightly
 However, for simplicity in the following discussion, I will continue to
 refer to the dataset as if it had 500 observations, with the understanding
 that the statistics I report may refer to a slightly smaller set or subset.)
 The quality and relevance variables were correlated with one another at
 0.747, meaning either one accounted for about 55.8% of the variation in the
\begin_inset LatexCommand \ref{wisc policy rel qual stats}


 shows the mean and standard error of the user judgments of quality and
 relevance of the documents in each of the five categories.
\begin_float tab 
\layout Caption

\begin_inset LatexCommand \label{wisc policy rel qual stats}


Descriptive Statistics, Relevance and Quality Judgments, Wisconsin Policy
\layout Standard
\align left 

\begin_inset  Tabular
<lyxtabular version="2" rows="8" columns="8">
<features rotate="false" islongtable="false" endhead="0" endfirsthead="0" endfoot="0" 
<column alignment="center" valignment="top" leftline="true" rightline="false" width="" 
<column alignment="center" valignment="top" leftline="true" rightline="false" width="" 
<column alignment="center" valignment="top" leftline="true" rightline="false" width="" 
<column alignment="center" valignment="top" leftline="true" rightline="false" width="" 
<column alignment="center" valignment="top" leftline="true" rightline="false" width="" 
<column alignment="center" valignment="top" leftline="true" rightline="false" width="" 
<column alignment="center" valignment="top" leftline="true" rightline="true" width="" 
<column alignment="center" valignment="top" leftline="false" rightline="true" width="" 
<row topline="true" bottomline="false" newpage="false">
<cell multicolumn="1" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="false" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

<cell multicolumn="2" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="false" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

<cell multicolumn="1" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="false" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

<cell multicolumn="2" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="false" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

<cell multicolumn="2" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="false" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

<cell multicolumn="1" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="true" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

<cell multicolumn="2" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="true" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

<cell multicolumn="2" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="false" rightline="true" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

<row topline="true" bottomline="false" newpage="false">
<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="false" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

Policy Area
<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="false" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

n subjects
<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="false" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

n ratings
<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="false" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="false" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="false" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

n ratings
<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="true" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="false" rightline="true" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

<row topline="true" bottomline="false" newpage="false">
<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="false" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="false" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="false" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="false" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="false" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="false" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="true" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="false" rightline="true" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

<row topline="true" bottomline="false" newpage="false">
<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="false" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="false" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="false" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="false" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="false" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="false" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="true" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="false" rightline="true" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

<row topline="true" bottomline="false" newpage="false">
<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="false" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="false" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="false" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="false" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="false" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="false" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="true" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="false" rightline="true" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

<row topline="true" bottomline="false" newpage="false">
<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="false" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="false" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="false" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="false" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="false" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="false" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="true" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="false" rightline="true" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

<row topline="true" bottomline="true" newpage="false">
<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="false" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

Health Care
<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="false" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="false" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="false" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="false" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="false" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="true" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="false" rightline="true" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

<row topline="false" bottomline="true" newpage="false">
<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="false" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="false" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="false" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="false" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="false" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="false" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="true" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="false" rightline="true" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard



\layout Standard

As one can see from table 
\begin_inset LatexCommand \ref{wisc policy rel qual stats}


, all of the categories have quite similar distributions, at least as far
 as the means and standard deviations are concerned.
 The ratings for relevance are systematically, but only slightly, above
 those for quality.
\layout Standard

Manual consideration of the pages after they were ranked by mean subject
 led me to the following conclusions about the behavior of the subjects.
 The subjects tend to rate highly for relevance those pages which are associated
 with high-status or highly-socially-visible institutions.
 For instance, the top 6 pages in terms of mean subject relevance ranking
 in the 
\begin_inset Quotes eld

Government and Politics
\begin_inset Quotes erd

 category are all home pages of state agencies (the Wisconsin Department
 of Natural Resources, Department of Justice, Office of the Governor, Department
 of Commerce, State Legislature home page, and the Wisconsin Department
 of Financial Institutions.).
\layout Standard

In the case of the health care pages, the top six pages were pages associated
 with the University of Wisconsin Hospital (one of the most, if not the
 most, prestigious hospitals in the state), Blue Cross/Blue Shield of Wisconsin
 (perhaps the best-known health insurer), the Medical College of Wisconsin
 (Wisconsin's other Medical School, besides that at the University of Wisconsin,
 and also very well-known in the state), Community Health Care of Wausau
 (not that well-known outside Wausau, but a beautifully-designed site),
 the Wisconsin branch of the March of Dimes (a very well-known organization),
 and another page associated with the Medical College of Wisconsin.
 What many of these top pages have in common is that they are associated
 with very well-known organizations and are very nicely-designed, probably
 because these institutions have the resources to hire professional graphic
 artists and web page designers.
 On the other hand, those with lower relevance scores tend to be associated
 with less prestigious or well-known organizations and to not be as well-designe
d, based on my browsing of the results.
\layout Standard

I built statistical models to attempt to account for the subjects ratings
 in terms of the presence or absence of over-represented words on a particular
 page, and the total number of over-represented wordstems.
\begin_inset LatexCommand \ref{top 100 wordstems wisc policy 1}


\begin_inset LatexCommand \ref{top 100 wordstems wisc policy 2}


 show the top 100 over-represented wordstems for each of the five categories,
 in the order of their degree of over-representation relative to a large
 corpus of random background text.
\begin_float tab 
\layout Caption

\begin_inset LatexCommand \label{top 100 wordstems wisc policy 1}


Top 100 Over-Represented Wordstems, Wisconsin Policy Areas (Economy, Education,
 and Environment)
\layout Standard

\begin_inset  Tabular
<lyxtabular version="2" rows="4" columns="2">
<features rotate="false" islongtable="true" endhead="0" endfirsthead="0" endfoot="0" 
<column alignment="left" valignment="top" leftline="true" rightline="false" 
width="1in" special="">
<column alignment="left" valignment="top" leftline="true" rightline="true" width="5in" 
<row topline="true" bottomline="false" newpage="false">
<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="false" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

\size small 
<cell multicolumn="0" alignment="left" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="true" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

\size small 
Top 100 Over-Represented Wordstems
Top 100 Over-Represented Wordstems
<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="false" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

\size small 
\size default 
<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="true" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

\size small 
wisconsin, state, develop, busi, econom, work, research, job, industri,
 program, economi, studi, year, impact, service, thi,madison, counti, univers,
 report, product, worker, employ, nation, percent, system, educ, center,
 school, high, technolog,area, million, resourc, tax, public, commun, milwauke,
 fund, train, provid, manufactur, depart, compani, health, increas,peopl,
 wi, support, local, includ, time, student, feder, technic, workforc, cost,
 labor, inform, make, project, skill, base,incom, growth, region, welfar,
 govern, institut, benefit, visitor, manag, american, recreat, opportun,
 famili, creat, care,rate, agricultur, total, plan, relat, annual, gener,
 bioscienc, home, wage, number, colleg, chang, uw, polisci, futur, top,
 investrequir, medic, life
<row topline="true" bottomline="false" newpage="false">
<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="false" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

\size small 
\size default 
<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="true" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

\size small 
school, wisconsin, state, educ, program, student, educat, public, colleg,
 teacher, wi, univers, high, milwauke, district,technic, court, work, home,
 system, madison, develop, year, children, associat, mpcp, parent, area,
 amend, privat, learn, nation, site, commu, depart, standard, technolog,
 includ, skill, uw, resourc, center, provid, counti, make, fax, question,choice,
 particip, scienc, offic, curriculum, thi, governor, support, requir, institut,
 people, servic, project, art, teach, middle,board, council, religi, plan,
 inform, report, instruction, level, establish, train, elementari, librari,
 activ, claus, busi, law, extension, studi, test, fund, administr, opportun,
 career, grade, local, time, social, people, purpose, life, effect, web,workforc
, benefit, base, thompson, education
<row topline="true" bottomline="true" newpage="true">
<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="false" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

\size small 
\size default 
<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="true" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

\size small 
wisconsin, water, environment, state, program, resourc, lake, river, manag,
 site, land, al, environ, develop, natur, sourc,transport, area, qualit,
 year, pollut, pcb, depart, includ, concentr, fish, protect, project, public,
 nation, system, feder, contamin, agricultur, work, plan, plant, wi, commun,
 requir, trade, report, issu, great, point, thi, regul, cryptosporidium,epa,
 energi, time, polici, madison, local, million, univers, nonpoint, research,
 control, inform, conver, gener, provid, impact,support, studi, industri,
 speci, wast, law, act, mine, group, anim, effect, fund, educ, increas,
 counti, altern, watersh, activ,base, reduc, peopl, wildlif, home, level,
 product, carp


\begin_float tab 
\layout Caption

\begin_inset LatexCommand \label{top 100 wordstems wisc policy 2}


Top 100 Over-Represented Wordstems, Wisconsin Policy Areas (Government/Politics
 and Health Care)
\layout Standard

\begin_inset  Tabular
<lyxtabular version="2" rows="3" columns="2">
<features rotate="false" islongtable="true" endhead="0" endfirsthead="0" endfoot="0" 
<column alignment="left" valignment="top" leftline="true" rightline="false" 
width="1in" special="">
<column alignment="left" valignment="top" leftline="true" rightline="true" width="5in" 
<row topline="true" bottomline="false" newpage="false">
<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="false" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

\size small 
<cell multicolumn="0" alignment="left" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="true" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

\size small 
Top 100 Over-Represented Wordstems
<row topline="true" bottomline="false" newpage="false">
<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="false" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

\size small 
Government and Politics
\size default 
<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="true" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

\size small 
wisconsin, state, counti, govern, polit, public, citi, tax, parti, site,
 madison, republican, local, campaign, year, depart, servic,inform, democrat,
 work, web, time, governor, district, thompson, vote, elect, peopl, home,
 wi, senat, milwauke, feder, court, legisl, reform, nation, candid, informat,
 program, make, monei, committe, thi, mail, law, search, resourc, green,
 univers, congres, report, issu, school, member, lafollett, town, student,
 board, contact, gener, legislatur, system, commun, budget,meet, interest,
 tommi, includ, support, dai, narrat, relat, busi, group, office, dave,
 progress, hous, list, bill, repres, commiss,health, tim, citizen, plan,
 polisci, famili, offic, colleg, record, call, find, presid, financ, provid,
 industri, offici, educat
<row topline="true" bottomline="true" newpage="false">
<cell multicolumn="0" alignment="left" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="false" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

\size small 
Health Care
<cell multicolumn="0" alignment="center" valignment="top" topline="true" 
bottomline="false" leftline="true" rightline="true" rotate="false" usebox="none" 
width="" special="">
\begin_inset Text

\layout Standard

\size footnotesize 
health, wisconsin, care, state, tobacco, program, servic, product, provid,
 public, inform, hospit, medicaid, nicotin, research,medic, percent, smoke,
 compani, commun, data, children, insur, home, year, center, patient, includ,
 counti, famili, group, wi, system, madison, market, long, cigarett, nation,
 clinic, plan, physician, defend, report, depart, polisci, informat, thi,
 base,manag, term, rate, develop, gener, md, time, increas, studi, network,
 fund, effect, site, industri, school, associat, support,continu, cost,
 nurs, milwauke, diseas, institut, requir, level, length, area, result,
 feder, work, peopl, univers, resourc, relat,administr, number, grant, make,
 issu, project, cancer, person, popul, order, access, addict, contact, benefit,
 ag, part, agent, primari


I suspected that the presence or absence of these overrepresented wordstems
 on a particular web page might be related to the subjects' judgments of
 the page's quality or relevance.
 However, this turned out to largely not be the case.
 For each of the 5 categories, I created a model where the presence or absence
 of each of the top 25 words in each of the lists above was coded as an
 independent variable in a linear regression, and the quality or relevance
 was the dependent variable.
 Therefore, there were ten such models.
 Very few of the wordstem proxies in these multiple regressions had betas
 that were significantly different than zero, leading to the suspicion that
 those that were significantly non-zero occurred by chance, given the large
 number of variables in the regressions.

