Hello,

I am working on "tm" package.
I have 2 pdf files saved in the directory D:/Files
I issued the following commands (marked in red bold) for which I got some
errors and warnings (marked in bold)

*surgj <- Corpus(DirSource("D:/Files"), readerControl = list(language =
"ansi"))*

*Warning messages:
1: In readLines(y, encoding = x$Encoding) :
  incomplete final line found on 'D:/Files/provmedsurgj00978-0005b.pdf'
2: In readLines(y, encoding = x$Encoding) :
  incomplete final line found on 'D:/Files/provmedsurgj00978-0007.pdf'*

*> inspect(surgj)*

*A corpus with 2 text documents

The metadata consists of 2 tag-value pairs and a data frame
Available tags are:
  create_date creator
Available variables in the data frame are:
  MetaID

[[1]]
%PDF-1.3
Error: invalid input '%Åþë×' in 'utf8towcs'*

Could anybody help me to identify where I went wrong and what I need to do
to proceed further?

Thanks,
Shreyasee

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to