I'm using text miner (the "tm" package) to process large numbers of blog and message board postings (about 245,000). Does anyone have any advice for how to efficiently extract the meta data from a corpus of this size?
TM does a great job of using MPI for many functions (e.g. tmMap) which greatly speed up the processing. However, the "meta" function that I need does not take advantage of MPI. I have two ideas: 1) Find a way of running the meta function in parallel mode. Specifically, the code that I'm running is: urllist <- lapply(workingcorpus, meta, tag = "FeedUrl") Unfortunately, I receive the following error message when I try to use the command "parLapply" "Error in checkCluster(cl) : not a valid cluster Calls: parLapply ... is.vector -> clusterApply -> staticClusterApply -> checkCluster" 2) Alternatively, I wonder if there might be a way of extracting all of the meta data into a data.frame that would be faster for processing? Thanks for any suggestions or ideas! Shad shad thomas | president | glass box research company | +1 (312) 451-3611 tel | shad.tho...@glassboxresearch.com | www.glassboxresearch.com [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.