I'm afraid I don't know enough about Java to comment on what might be the problem with your statistic.pl statement. The statement itself seems fine, so it must have something to do with how Java does these kinds of calls (and how it creates output files). Perhaps you need to specify the complete path of either the input or output file? These are guesses, I am really not sure. Mostly I run within a Perl only environment (on Linux). If you resolve this, please do let us know as this seems like it will be quite helpful for other others.
As to dealing with accented characters and so forth - there has been considerable discussion on this issue on this list in the past, the most recent version starting here : http://tech.groups.yahoo.com/group/ngram/message/206 At the moment I think the easiest solution (although not perfect by any means) is to add "use locale;" statements to your code. However, there are some pitfalls with that which you'll find discussed in the mailing list. But, I think if you search through the mailing list for "locale" and also for "encode" or "encoding" you will find quite a lot of information that will hopefully be helpful. Cordially, Ted On Sun, Sep 7, 2008 at 2:27 PM, arezki20002002 <[EMAIL PROTECTED]> wrote: > Hi Ted, > > I met a problem when using NSP : the count.pl program generates the > output.txt file , but the statistic.pl does not generate > output_pmi.txt file > I have this code with java : > Process p = Runtime.getRuntime (). Exec (perl g:/Text-NSP- > 1.09/bin/statistic.pl mi.pm output_pmi.txt output.txt); > What is the problem? > > The second problem concerns accented letters (é,è,ç,...) how not > conceder them as end of token. because I work on a french corpus > > thank you in advance. > Arezki > > -- Ted Pedersen http://www.d.umn.edu/~tpederse