Hello, In the category, fun on friday, I was curious to investigate the results of feeding DSpace item titles into Wordle ( http://www.wordle.net ), and see what would come up.
Wordle visualizes the occurrence of words for any amount of text you feed it. Basically Worlde counts the times a specific word occurs, and represents words that occur many times large, and words that only occur a few times, smaller, in one resulting picture. As a data source, I used K.U. Leuven's LIRIAS repository ( http://lirias.kuleuven.be ), a large and rapidly growing repository. This DSpace's hierarchy is subject oriented, as the communities and collections are organized according to the institution's organizational structure. For this experiment, I took three top level communities: the Biomedical Sciences group, the Humanities and Social Sciences group and last (but not least) the Sciences, Engineering and Technology group. Using @mire's reporting suite ( http://atmire.com/USB/resources/reporting_suite.html ) it took me five minutes to generate a clean list of the item titles of International Publications (a small subset of the content) for each of these top level communities, that were submitted in 2009 (500+ for each of these groups). These lists were used to create following Wordles: Humanities and Social Sciences - http://www.wordle.net/gallery/wrdl/1003572/K.U._Leuven_Humanities_and_Social_Sciences_publications_2009 Biomedical Sciences - http://www.wordle.net/gallery/wrdl/1003562/K.U._Leuven_Biomed_Publications_2009 Science, Engineering and Technology - http://www.wordle.net/gallery/wrdl/1003577/K.U._Leuven_Science%2C_Engineering_and_Technology_publications_2009 It was funny to see that almost all titles were in english for the Biomed and SE&T groups. For Humanities and Social Sciences, there was a mix between english and dutch titles. Wordle allows you to filter the most common words (the, an, a, ...) for one particular language. So to clean the Humanities & Social Sciences Worldle from both english and dutch stop-words, I had to do some manual work on the list. Although already a sub-selection of three groups was made, you still see a lot of "generic" scientific terms, and not so many interesting subject keywords. That's quite logic, because although the scientists belong to the same group, they're still dealing with a variety of subjects. When zooming in on more specific subjects, here's the Wordle from the Computer Science department 2009 publications (one subcommunity level below the Groups): http://www.wordle.net/gallery/wrdl/1003647/K.U._Leuven_Computer_Science_publications_2009 And even more specific, here's the one for the researchgroup of Experimental Radiotherapy, under the Department of Oncology in the group of Biomedical sciences. For this one, I took all of the publications from 2000-2009 to get a relevant selection. http://www.wordle.net/gallery/wrdl/1003638/K.U._Leuven_Experimental_Radiotherapy_Publications_2000-2009 best regards, Bram Luyten @mire - http://www.atmire.com Technologielaan 9 - 3001 Heverlee - Belgium 533 2nd Street - Encinitas, CA 92024 - USA http://www.togather.eu - Before getting together, get t...@ther
_______________________________________________ Dspace-general mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/dspace-general
