Dear all,

I have some troubles using the stemming algorithm provided by the tm (text mining) + Snowball packages.
Here is my config:

MacOS 10.5
R 2.12.0 / R 2.13.1 / R 2.14.1 (I have tried several versions)

I have installed all the needed packages (tm, rJava, rWeka, Snowball) + dependencies. I have desactivated AWT (like written in http://r.789695.n4.nabble.com/Problem-with-Snowball-amp-RWeka-td3402126.html) with :

Sys.setenv(NOAWT=TRUE)

The command tm_map(reuters, stemDocument) gives the following errors :

- First time:
Error in .jnew(name) :
java.lang.InternalError: Can't start the AWT because Java was started on the first thread. Make sure StartOnFirstThread is not specified in your application's Info.plist or on the command line
Refreshing GOE props...

- Second time:
Stemmer 'porter' unknown!
Stemmer 'english' unknown!
Stemmer 'porter' unknown!
Stemmer 'english' unknown!
Stemmer 'porter' unknown!
Stemmer 'english' unknown!
Stemmer 'porter' unknown!
Stemmer 'english' unknown!
Stemmer 'porter' unknown!
Stemmer 'english' unknown!
(etc.)

I have already search the Web for a solution, but I have found nothing useful.

Here is the full source code (all the librairies are already loaded):
------
Sys.setenv(NOAWT=TRUE)
source <- ReutersSource("reuters-21578.xml", encoding="UTF-8")
reuters <- Corpus(source)
reuters <- tm_map(reuters, as.PlainTextDocument)
reuters <- tm_map(reuters, removePunctuation)
reuters <- tm_map(reuters, tolower)
reuters <- tm_map(reuters, removeWords, stopwords("english"))
reuters <- tm_map(reuters, removeNumbers)
reuters <- tm_map(reuters, stripWhitespace)
reuters <- tm_map(reuters, stemDocument)
------

Thank you for your help,

Julien

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to