I have never used r2netcmd, but it seems very configurable. Perhaps you
need to consult the manual (http://www.logictran.com/CustomRef.html) about
translation files?
I think that you do not need to use $ED like this with r2netcmd:
$cmdl = "(cp $Input $rtf ; $cmd $rtf | $ED
's#^<TITLE>$Input</TITLE>#<TITLE>[$Name]</TITLE>#' ; $RM $rtf)";
because you should be able to get r2netcmd to generate a sensible title
itself. If you can't, then shouldn't the line be:
$cmdl = "(cp $Input $rtf ; $cmd $rtf | $ED
's#^<TITLE>$rtf</TITLE>#<TITLE>[$Name]</TITLE>#' ; $RM $rtf)";
?
How are you getting r2netcmd to produce its output on STDOUT for input back
into htdig?
Sorry, I don't under stand what you mean by "if I index my tampon directory
containing "�" but with a wrong directory and a bad extension.( Instead
http://linuxsrv/test/file.rtf i have http://linuxsrv/tampon/file.html )."
How are you using doc2html? Are we misunderstanding something about your
approach?
--
David Adams
Computing Services
Southampton University
----- Original Message -----
From: <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Wednesday, June 13, 2001 9:37 AM
Subject: RE: [htdig] problem with r2net
> Thanks for your response Gilles and David
>
> I have try your method , I think also that's not a Locale probleme (I ahve
> try to index an simple txt with accent and it's ok)
> The problem now is that i have 1 index containing only e9... and a second
(
> if I index my tampon directory containing "�" but with a wrong directory
and
> a bad extension.( Instead http://linuxsrv/test/file.rtf i have
> http://linuxsrv/tampon/file.html ). Could you give me more explication on
> doc2html.pl (on the command line cmdl) to change the path when i index the
> file.rtf.
>
> here my new doc2html.pl
>
****************************************************************************
> ***********
> # RTF documents
> if ((defined $RTF2HTML) and (length $RTF2HTML)) {
> $mime_type = "application/msword|application/rtf|text/rtf";
> my $rtf = quotemeta("$TMP/$Name");
>
> $cmd = $RTF2HTML;
> # Rtf2html uses filename as title, change this:
> $cmdl = "(cp $Input $rtf ; $cmd $rtf | $ED
> 's#^<TITLE>$Input</TITLE>#<TITLE>[$Name]</TITLE>#' ; $RM $rtf)";
> $magic = '^{\134rtf';
> &store_html_method('RTF',$cmd,$cmdl,$mime_type,$magic);
> }
>
>
****************************************************************************
> ***********
>
> -----Message d'origine-----
> De : Gilles Detillieux [mailto:[EMAIL PROTECTED]]
> Envoy� : mardi 12 juin 2001 19:20
> � : [EMAIL PROTECTED]
> Cc : [EMAIL PROTECTED]; [EMAIL PROTECTED]
> Objet : Re: [htdig] problem with r2net
>
>
> According to David Adams:
> > On Tue, 12 Jun 2001 16:04:03 +0200
> > [EMAIL PROTECTED] wrote:
> > > Problem between r2netcmd and htdig.
> > >
> > > for indexing rtf file , I have replaced rtf2html by the r2net software
> in
> > > doc2html.pl .I am working under linux (Suze) to make conversion of rtf
> in
> > > French language.
> > > To use r2netcmd with htdig I have routing the input in a fichier .rtf
in
> /
> > > tmp for the external parser:
> [snip]
> > > the tampon.rtf file give me for example :
> > >
> > > ****************************************************
> > > bb et d\'e9ziper ce fichier dans ce nouveau r\'e9pertoire.
> > > ****************************************************
> > >
> > > and the tampon.html file gives me the accents but my database
> db.wordlist
> > > gives me only e9pertoire.
> > >
> > > someone could help me.
> >
> > I think this is a LOCALE problem, which I am not capable of
> > answering.
>
> I'm not so sure about this being a locale problem. The fact that the
> "e9" appears as part of the word in the db.wordlist suggests to me that
> r2net isn't converting the \'e9 sequence in the RTF file into an é
> or � in the resulting HTML output. If \'e9 is valid RTF syntax for an �,
> and r2net isn't converting it correctly, then it's a problem with r2net.
> If it were merely a locale problem in htdig, the wordlist would contain
> pertoire, rather than e9pertoire, because it would treat the � as a word
> separator.
>
> > As regards your modification of doc2html.pl, this does seem
> > unnecessarily involved, simply to provide a *.rtf file for
> > r2netcmd. How about simply changing the command line to:
> >
> > my $rtf = quotemeta("$TMP/$Name.rtf");
> > $cmdl = "(cp $Input $rtf; $RTF2HTML $rtf; $RM $rtf)";
> >
> > I havn't tried this, but it should work.
> > Even better, you might consider creating a symbolic link to
> > $Input rather making a copy.
>
> Yes, you could replace the cp command above with ln or ln -s. In any
case,
> you still need the rm (or $RM) command above to remove the copy or link.
>
> You'll also need the $ED command to change the title, as in the original
> example code you posted.
>
> --
> Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]>
> Spinal Cord Research Centre WWW:
> http://www.scrc.umanitoba.ca/~grdetil
> Dept. Physiology, U. of Manitoba Phone: (204)789-3766
> Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
>
> _______________________________________________
> htdig-general mailing list <[EMAIL PROTECTED]>
> To unsubscribe, send a message to
> <[EMAIL PROTECTED]> with a subject of
unsubscribe
> FAQ: http://htdig.sourceforge.net/FAQ.html
>
> _______________________________________________
> htdig-general mailing list <[EMAIL PROTECTED]>
> To unsubscribe, send a message to
<[EMAIL PROTECTED]> with a subject of unsubscribe
> FAQ: http://htdig.sourceforge.net/FAQ.html
>
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html