[R] How to read plain text documents into a vector?

2009-10-13 Thread Richard Liu
I'm new to R. I'm working with the text mining package tm. I have several plain text documents in a directory, and I would like to read all the files with extension .txt in that directory into a vector, one text document per vector element. That is, v[1] would be the first document, v[2] the

Re: [R] How to read plain text documents into a vector?

2009-10-13 Thread Richard Liu
kenhorvath wrote: Paul Hiemstra wrote: file_list = list.files(/where/are/the/files) obj_list = lapply(file_list, FUN = yourfunction) yourfunction is probably either read.table or some read function from the tm package. So obj_list will become a list of either data.frame's or

[R] Increase the Java heap space for openNLP

2009-10-26 Thread Richard Liu
When I run sentDetect in the openNLP package I receive a Java heap space exception. How can I increase the heap space? I am running the 64-bit Leopard version of R 2.9.2 and R.app on a Mac with OS X 10.5.8. Thanks, Richard -- View this message in context:

[R] R 2.10.0: Error in gsub/calloc

2009-11-03 Thread richard . liu
I'm running R 2.10.0 under Mac OS X 10.5.8; however, I don't think this is a Mac-specific problem. I have a very large (158,908 possible sentences, ca. 58 MB) plain text document d which I am trying to tokenize: t - strapply(d, \\w+, perl = T). I am encountering the following error: Error in

Re: [R] Compact Patricia Trees (Tries)

2010-04-29 Thread Richard Liu
Gabor, Thanks for the suggestion, I'll try it out tonight or tomorrow. Regards, Richard _ Richard R. Liu Dittingerstr. 33 CH-4053 Basel Switzerland Tel. +41 79 708 67 66 Sent from my iPhone 3GS On Apr 29, 2010, at 13:06, Gabor Grothendieck ggrothendi...@gmail.com wrote:

[R] Machine Learning and Sample Size

2009-12-04 Thread Richard Liu
In developing a machine learner to classify sentences in plain text sources of scientific documents I have been using the caret package and following the procedures described in the vignettes. What I miss in the package -- but quite possibly I am overlooking it! -- is functions