I tried 'convmv' and it worked great. One point to be aware of: I had two 
directory structures and it seems that one was in cp862 and one was in iso-
8859-8. At first I ran the same command on both directories:
convmv -r -f cp862 -t utf8 --nfc directory1
That worked fine for the cp862 encoded directory but it messed up the iso-8859-
8 one. Good thing I backed them up first. I then had to play around to figure 
out what encoding the second directory was in and then ran:
convmv -r -f iso-8859-8 -t utf8 --nfc --notest directory2
It worked fine and I am now all UTF8.

One question though; Is there a way to query what encoding a file or 
directory's name is in? I had to just keep trying different 'from' encodings 
until it worked.

-- 
Chaim Keren Tzion       | [EMAIL PROTECTED]
System Administrator    " The Hebrew University of Jerusalem
Dept. of Neurobiology   " Tel: 972-2-658-5083
Inst. of Life Science   " Cel: 972-2-54-652983
Jerusalem 91904, Israel " Fax: 972-2-658-6296
...................... : ............................


Quoting Yedidyah Bar-David <[EMAIL PROTECTED]>:

> On Wed, Dec 31, 2003 at 02:31:21PM +0200, Chaim Keren Tzion wrote:
> > Didi,
> > 
> > Could you post the script you mention that you use for converting cp862 and
> 
> > iso8859-8 filed and directories to UTF-8?
> 
> I don't mind to, but I recently saw on freshmeat something called
> 'convmv' which is probably better. If you try it, please tell us
> what you think about it. Especially if you have non-trivial filenames.
> -- 
> Didi
> 
> > 
> > Thanks
> > -- 
> > Chaim Keren Tzion       | [EMAIL PROTECTED]
> > System Administrator    " The Hebrew University of Jerusalem
> > Dept. of Neurobiology   " Tel: 972-2-658-5083
> > Inst. of Life Science   " Cel: 972-2-54-652983
> > Jerusalem 91904, Israel " Fax: 972-2-658-6296
> > ...................... : ............................
> > 
> > 
> > On Sun, Oct 26, 2003 at 10:15:58PM +0200, Gal Goldschmidt wrote: 
> > > Hi, 
> > > 
> > > The solution is very simple, you need to convert the Hebrew file names on
> 
> > the 
> > > server to UTF-8 encoded. 
> > > 
> > > Here is a script adopted from the SAMBA docs: 
> > > find /path/to/share -type f -exec bash -c 'CP="{}"; ISO=`echo -n "$CP" |
> \ 
> > > iconv -f cp862 -t UTF-8`; if [ "$CP" != "$ISO" ]; then mv "$CP" \ 
> > > "$ISO"; fi' \; 
> > 
> > 
> > I did not try it myself, I use a bit different one, but you surely need 
> > at least '-depth' or you will have problems with Hebrew dirs with 
> > Hebrew files in them. I suggest, in any case, that you double-check it 
> > before running, especially on a large, multi-user file server. Windows 
> > users love to put all kinds of characters in their file names - at 
> > least put in your test cases all punctuation (including all types of 
> > quotes), and also files whos names will be the same (e.g. one was 
> > written with cp862 and the other with iso8859-8 - this happened to us 
> > with a netapp that was accessed both directly from Windows and through 
> > samba) - and change the script to do what's best for you in such a case. 
> > 
> > 
> > -- 
> > Didi
> > > 
> > > Bye
> > > Gal
> > > 
> > > 
> > > On Sunday 26 October 2003 20:07, Dotan Mazor wrote:
> > > > Well, you could try to write "utf-8" instead of "utf". I didn't have
> to
> > > > change anything, but then, I got all my Hebrew files changed to
> undescores
> > > > (like this: ________.___), which made me brake a few chairs.
> > > >
> > > > Oh well, I guess you better take advices from someone who knows at
> least a
> > > > bit of what he's talking about...
> > > >
> > > > Dotan
> > > > ---
> > > > On Tue, 30 Sep 2003 13:25:03 +0200, Ben-Nes Michael
> <miki_at_canaan.co.il>
> > > >
> > > > wrote:
> > > > > Hi All & Shana Tova
> > > > >
> > > > > im trying to move my files from samba 2.x to 3.x version.
> > > > >
> > > > > I mounted the old samba on /mnt/oldsmb but I couldn't find how to
> tell
> > > > > it to
> > > > > load it as utf ( on the Linux side ) and I just get gibberish on
> console,
> > > > > win$ & putty.
> > > > >
> > > > > I think its something with the charset but I couldn't find the right
> > > > > combination:
> > > > >
> > > > > mount -t smbfs -o iocharset=he_IL.utf,codepage=win1255
> //Share2/documents
> > > > > /mnt/oldsmb/
> > > > >
> > > > > --------------------------
> > > > > Canaan Surfing Ltd.
> > > > > Internet Service Providers
> > > > > Ben-Nes Michael - Manager
> > > > > Tel: 972-4-6991122
> > > > > Fax: 972-4-6990098
> > > > > http://www.canaan.net.il
> > > > > --------------------------
> > 
> > 
> > 
> > -------------------------------------------------
> > This mail sent through IMP: http://horde.org/imp/
> > 
> > 
> > ================================================================To
> unsubscribe, send mail to [EMAIL PROTECTED] with
> > the word "unsubscribe" in the message body, e.g., run the command
> > echo unsubscribe | mail [EMAIL PROTECTED]
> 
> =================================================================
> To unsubscribe, send mail to [EMAIL PROTECTED] with
> the word "unsubscribe" in the message body, e.g., run the command
> echo unsubscribe | mail [EMAIL PROTECTED]
> 


-------------------------------------------------
This mail sent through IMP: http://horde.org/imp/


================================================================To unsubscribe, send 
mail to [EMAIL PROTECTED] with
the word "unsubscribe" in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]

Reply via email to