I'm afraid I don't know enough about Java to comment on what might be
the problem with your statistic.pl statement. The statement itself
seems fine, so it must have something to do with how Java does these
kinds of calls (and how it creates output files). Perhaps you need to
specify the complete path of either the input or output file? These
are guesses, I am really not sure. Mostly I run within a Perl only
environment (on Linux). If you resolve this, please do let us know as
this seems like it will be quite helpful for other others.

As to dealing with accented characters and so forth - there has been
considerable discussion on this issue on this list in the past, the
most recent version starting here :

http://tech.groups.yahoo.com/group/ngram/message/206

At the moment I think the easiest solution (although not perfect by
any means) is to add "use locale;" statements to your code. However,
there are some pitfalls with that which you'll find discussed in the
mailing list. But, I think if you search through the mailing list for
"locale" and also for "encode" or "encoding" you will find quite a lot
of information that will hopefully be helpful.

Cordially,
Ted

On Sun, Sep 7, 2008 at 2:27 PM, arezki20002002 <[EMAIL PROTECTED]> wrote:
> Hi Ted,
>
> I met a problem when using NSP : the count.pl program generates the
> output.txt file , but the statistic.pl does not generate
> output_pmi.txt file
> I have this code with java :
> Process p = Runtime.getRuntime (). Exec (perl g:/Text-NSP-
> 1.09/bin/statistic.pl mi.pm output_pmi.txt output.txt);
> What is the problem?
>
> The second problem concerns accented letters (é,è,ç,...) how not
> conceder them as end of token. because I work on a french corpus
>
> thank you in advance.
> Arezki
>
> 



-- 
Ted Pedersen
http://www.d.umn.edu/~tpederse

Reply via email to