Would it be helpful to have the option to change the encoding on the fly like jEdit's "reload with encoding"?
-Keegan On Wed, Apr 3, 2013 at 4:38 PM, Nick <[email protected]> wrote: > Inline below. > > On Thu, 2013-04-04 at 05:50 +1000, Kai Willadsen wrote: > > (Answering lots of things at once, and not in order.) > > > > On 4 April 2013 00:06, Nick <[email protected]> wrote: > > > I think I found a solution. It required 3 pieces: > > > > > > 1. Set Meld's Encodings to: > > > utf8, utf-16le, iso8859 > > > > > > 2. Set SVN's mime-type property on the UTF-16 files to > > > text/plain;encoding=UTF-16LE. > > > > > > 3. Placed a BOM in the UTF-16 files. > > > > > > With this configuration I am able to view UTF-8 and UTF-16 files in > Meld > > > without changing the configuration. The files can be directly from the > > > filesystem (ie. meld file1 file2) or via the SVN hook within Meld. > > > > > > > > > In the process of experimenting on this (and I think contributing to > the > > > problem), I think I found a bug in Meld. It seems that once I attempt > > > to view/diff a file that's in SVN which fails, other files which > > > normally work also fail. Here's a breakdown of the steps I observe > this > > > happening (using Meld 1.7.0): > > > > > > (1) Open Meld for a directory inside a SVN working copy, which > contains > > > 3 files: a.xml (a UTF-16LE file without a BOM), b.xml (a UTF-16LE file > > > with a BOM), c.txt (a UTF-8 file). > > The issue seems to be tied to opening a binary file. In this case, > a.xml only needs to be considered binary (SVN's svn:mime-type property > set to application/octet-stream). > > > > (2) Set Meld's Encodings configuration to "utf8, utf-16le" > > > (3) Open/View b.xml. This should work. > > > (4) Open/View c.txt. This should work. > > > (5) Attempt to open a.xml. This should yield an error that the file > is > > > binary (as expected). > > > (6) Now attempt to open/view b.xml again. It fails with the same > > > error. > > > > > > The only way I've found to get it out of this stuck state is to refresh > > > the listing. > > > > > > I can try creating a screen recording of this behavior if it helps. > > > > A screen recording wouldn't really help - those are pretty clear > > instructions - but if there's any way you could provide a SVN working > > copy to reproduce the problem, then that would be great. > > I've attached a patch which exhibits the problem that you should be able > to apply to any WC since it doesn't require modifications to existing > files (ie. new files marked for addition exhibit the problem fine). Let > me know if you can't apply this patch to a WC and I'll provide a > procedure or a script to create the same (it's very easy). > > Note the svn:mime-type property on each file: > - File1.txt is UTF-16LE that's marked as binary by SVN > (application/octet-stream). > - File2.txt is UTF-16LE that's marked as UTF-16LE by SVN. > - File3.txt is UTF-8. > > To see the problem: > 1. Open meld for the directory in the WC where you applied the patch. > 2. Set the Encodings to "utf8, utf-16le". > 3. Open/view the files in reverse order: (File3.txt, File2.txt, > File1.txt). Notice that you can view File3 & File2, but not File1. > That's sort-of expected since File1.txt is marked as binary. (I say > sort-of because the file is really UTF-16LE and includes a suitable > BOM). > 4. Now open/view File2.txt again, and notice that it does not open this > time. Refresh the directory listing in Meld, and you can once again > open it. > > Let me know if I can help w/ more info. > > > > > > > On Wed, 2013-04-03 at 09:41 -0400, Nick wrote: > > >> Looks like if I change the order of the codecs such that utf16 is > listed > > >> first, then Meld displays the file fine. But then I lose the ability > to > > >> view UTF-8 files. So it seems like it's one or the other, but not > both. > > >> > > >> If this is true, I don't understand the purpose of being able to > specify > > >> more than one encoding in the Preferences dialog. > > >> > > >> Can Meld support going through each specified encoding while the file > is > > >> not displayable (including the finding that it's a 'binary' file)? > This > > >> will allow me to specify "utf8, utf16" for the encodings which will > > >> support UTF-8 and UTF-16 files to be used in Meld w/out changing the > > >> configuration. > > > > That's exactly what we do... except that the binary file check is > > unrelated to the rest. Having said that, reordering those really > > shouldn't avoid the binary file check. > > > > >> On Wed, 2013-04-03 at 08:48 -0400, Nick wrote: > > >> > Hi, > > >> > > > >> > First and foremost, thanks for a great diff & merge tool! > > >> > > > >> > My project involves XML files which need to be encoded in UTF-16 > Little > > >> > Endian. I cannot seem to view or diff UTF-16 files with Meld. > > >> > > > >> > In the Encoding tab of the Preferences dialog I have this for the > > >> > codecs: > > >> > > > >> > utf8, iso8859, utf16, utf-16, utf16le, utf-16le > > >> > > > >> > When I try to open a UTF-16LE file that's in SVN, Meld displays a > yellow > > >> > error bar on top which reads, "Error fetching original comparison > file". > > >> > I've confirmed UTF-8 files in the repo open fine--it's only an > issue w/ > > >> > UTF-16 files. > > >> > > > >> > It behaves the same even for files which are marked for addition in > the > > >> > repo but not yet added (so in this case, there's nothing to diff > > >> > against, but normally Meld will display the contents of the file > > >> > alongside a blank pane). > > >> > > > >> > I've tried UTF-16 files that contain a BOM and files which do not; > no > > >> > difference. > > >> > > > >> > I notice that SVN sets the mime-type on these files as binary > > >> > (application/octet-stream). If I manually change it to UTF-16LE > > >> > (text/plain;encoding=UTF-16LE), Meld displays a yellow error bar on > top > > >> > which reads, "Could not read file" "test.xml appears to be a binary > > >> > file."--but it still doesn't display the contents of the file. > > > > I had no idea the mime-type behaviour would be different... we > > certainly don't do anything on the SVN end with regards to that. I > > guess that's a possibly-interesting issue with the new SVN support. > > Yeah, I got the idea from > http://rhubbarb.wordpress.com/2012/04/28/svn-unicode/ which speaks only > about subversion support. I confirmed that the svn command line tool > functions fine according to the mime-type. > > > > What version of Subversion are you using here? We fetch files in very > > different ways for <1.6 and 1.7. > > Client and server (same machine) are 1.7: > > nick@nimble ~/test_repo $ svn --version > svn, version 1.7.7 (r1393599) > compiled Jan 5 2013, 15:01:56 > > > > >> > If I call meld and pass it 2 UTF-16 files on the file system (ie. > not > > >> > trying to open a file from the SVN listing), I still get a yellow > error > > >> > bar on top which reports "Could not read file" "test.xml appears to > be a > > >> > binary file." > > >> > > > >> > Is there something else I need to do? > > >> > > > >> > Has anyone used Meld to diff UTF-16 files? > > > > No, and it's known not to work. In fact, it shouldn't be possible to > > view UTF-16 files in Meld. Or at least, this is what I would have said > > if I'd seen this email before I saw your follow-ups. > > > > The problem is that in FileDiff._load_files, we check for null bytes > > in the file we're reading in, and throw up our hands and declare a > > file to be binary if there are any. This works shockingly well, > > considering how wrong it is. Obviously it falls over pretty badly for > > UTF16. What I'm actually more puzzled by is that you've somehow > > managed to find a way around this! > > > > Also, this is bug 632540: > > https://bugzilla.gnome.org/show_bug.cgi?id=632540 > > > > It's nice when thing work when they ought not to, huh? Certainly nicer > than the opposite. :) > > I'm too lazy to look at the code now, so do you scan the entire file for > null bytes, or just the beginning? As I mentioned, the presence of the > BOM makes a difference. Is it possible it's only looking at the > beginning couple bytes? > > One thing I noticed was that specifying invalid encodings makes a > difference. I got the clue from some error messages printed to the > console about one of my encodings being invalid. I got the initial list > from iconv's list, where they are all valid, but I guess Meld's > underlying library has a different list. Anyway, that's how my list > shrank > from: utf8, iso8859, utf16, utf-16, utf16le, utf-16le > to: utf8, utf-16le, iso8859 > > Using process of elimination I found the only encoding that actually > worked for UTF-16LE files is utf-16le. The others I had (like utf16) > did not work. I mention this because in the bug you referenced, Martin > Weis reports Meld does not work and cites the encodings "utf16 utf-16". > It's possible that is the cause of the problem for him. > But I can say for sure the prescriptive steps I provided above work. > Please don't break it. :) > > > > cheers, > > Kai > > _______________________________________________ > meld-list mailing list > [email protected] > https://mail.gnome.org/mailman/listinfo/meld-list >
_______________________________________________ meld-list mailing list [email protected] https://mail.gnome.org/mailman/listinfo/meld-list
