Hello, I am working on "tm" package. I have 2 pdf files saved in the directory D:/Files I issued the following commands (marked in red bold) for which I got some errors and warnings (marked in bold)
*surgj <- Corpus(DirSource("D:/Files"), readerControl = list(language = "ansi"))* *Warning messages: 1: In readLines(y, encoding = x$Encoding) : incomplete final line found on 'D:/Files/provmedsurgj00978-0005b.pdf' 2: In readLines(y, encoding = x$Encoding) : incomplete final line found on 'D:/Files/provmedsurgj00978-0007.pdf'* *> inspect(surgj)* *A corpus with 2 text documents The metadata consists of 2 tag-value pairs and a data frame Available tags are: create_date creator Available variables in the data frame are: MetaID [[1]] %PDF-1.3 Error: invalid input '%à þëÃ' in 'utf8towcs'* Could anybody help me to identify where I went wrong and what I need to do to proceed further? Thanks, Shreyasee [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.