On May 23, 2009, at 7:57 PM, Nicolas Cellier wrote: > I confirm the scenario: > 1) update10298 condenseChanges that let (SourceFiles at: 2) class = > StandardFileStream > This is the seed of further problems, because further changes will > be encoded in latin1 (or MacRoman I don't really wnt to know) > 2) update10302 changes the methods with non ASCII characters > 3) Stef save the image after update10304, that does reopen > (SourceFiles at: 2) in UTF-8, but that's too late, the worm is in the > apple. > > If you save the image just after the condenseChanges, no problem > because (SourceFiles at: 2) is opened in Latin1 AFTER all the changes > have gotten into it, and reopened UTF-8 before any changes got into > it. > We must track undue usage of StandardFileStream such as > #condenseChanges.
Ok now we cannot really rollback the changes and I fixed the methods that were leading to invalid UTF. But it means that we should check the StandardFileStream usage. I"m doing some experiences with umejava code Stef > > > 2009/5/23 Nicolas Cellier <nicolas.cellier.aka.n...@gmail.com>: >> What happened exactly is very hard to trace because these FileStream >> are a can of worms... >> Here are some of my perigrinations: >> >> FIRST POSSIBLE TRACK: >> >> All methods were changed in 10305. >> Monticello snapshot/source.st is not UTF-8. >> If the file is opened UTF-8, then we get decompiledCode, I don't >> know why yet... >> But the changes still go into the change log in correct UTF-8 form, >> so >> that's just another bug, but not the real source of the problem. >> For getting some worms out of the can just browse inst var defs of >> converter in MultiByteFileStream: >> The accessor #converter initialize converter with TextConverter >> defaultSystemConverter which depends on LanguageEnvironment. >> That is a Latin1TextConverter in my latin image. >> Unless #reset is called first, in which case it will initialize >> with a >> UTF8TextConverter. >> Yes, but open: fileName forWrite: writeMode, does the job too with a >> UTF8TextConverter. >> You still follow? me neither. >> A better behaved is #setConverterForCode that should let non UTF-8 >> .mcz work in UTF-8 environment, but not sure if called where >> required... >> I think Yoshiki changes are necessary only for writing source code >> with character code > 255. >> This was not the case of incriminated methods. >> >> SECOND POSSIBLE TRACK: >> >> Everything going to the change log pass thru the MultiByteFileStream, >> so how did non UTF-8 characters went in? >> I tried to follow two other clues: >> 1) There are senders of #primWrite:from:startingAt:count: not >> redefined in MultiByteFileStream... >> for example, using #next:putAll:startingAt: will bypass the >> converter. >> 2) using nextPutAll: with a ByteArray argument also does bypass the >> converter (See MultiByteFileStream>>#nextPutAll:) >> I did not find the senders (you really believe senders of nextPutAll: >> can be analyzed?). >> I tried to instrument code with Notification, but I'm unable to >> reproduce the problem, so that was vain... >> >> THIRD POSSIBLE TRACK: >> >> http://gforge.inria.fr/frs/download.php/22283/ >> Pharo0.1Core-10304cl.zip >> has the invalid UTF-8 problem, just before 10305 changes that >> introduced decompiled code... >> So we might attack the problem with another code snippet: >> >> (SystemNavigation default browseAllCallsOn: (Smalltalk associationAt: >> #SourceFiles))... >> >> Hmm, I might have a better clue now. >> The problem might possibly come from the condenseChanges in >> update10298. >> What happen in a condenseChanges? >> Changes are copied to this file: >> >> f := FileStream fileNamed: 'ST80.temp'. >> >> So far, so good, because the concreteStream is a MultiByteFileStream. >> >> But the end finishes with: >> >> SourceFiles >> at: 2 >> put: (StandardFileStream oldFileNamed: oldChanges name) >> >> Waouh, no MultiByteFileStream here, so no more UTF-8. >> But hey, that would be the inverse problem: reading UTF-8 text with >> latin1 reader: I can't get an error doing this, only some strange >> sequence of characters... (The UTF-8 encoding)... >> Unless incriminated methods are further changed in #script376 or any >> other method... In which case they are written in latin1 in the >> changeLog... >> Hmm... That could be the case eventually. We must restart update >> process from >> http://gforge.inria.fr/frs/download.php/22167/Pharo0.1Core-10296cl-2.zip >> >> One thing is sure, at next returnFromSnapshot, FileDirectory >> class>>startup will reopen changes UTF-8. >> So saving the image will reopen UTF-8... >> >> But wait... Maybe we get enough pieces of the puzzle: >> Analyzing the Pharo0.1Core-10304cl.changes tells that Stephane >> applied >> several updates before snapshoting the image. So if Kernel and >> System-Support are changed between 10298 and 10304, then we get the >> explanation: >> - condense changes put all in the .changes in UTF-8 but reopen the >> changes in latin1 >> - further updates up to 10304 write changes in latin1 >> - image snapshot reopen changes in UTF-8 and thus we get further >> invalid UTF-8... >> >> That's easy to reproduce. Stef, can you confirm? >> >> That also explain why I did not get the problem at home: I update >> early and always save my image after. >> After that we still have to detect and clean while Monticello sources >> are interpreted UTF-8 when they should not (FIRST TRACK) , and >> eventually make source code go UTF-8 in Monticello, so that non latin >> programmers can use their favourite language eventually... >> >> Nicolas >> >> 2009/5/23 Stéphane Ducasse <stephane.duca...@inria.fr>: >>> No problem I never interpreted it like that. >>> Me too I want a system that is working >>> >>> Adrian I will publish a fix for DNU now >>> and I will try later to check the fixes proposed by yoshiki >>> >>> stef >>> >>> On May 23, 2009, at 1:29 PM, Tudor Girba wrote: >>> >>>> Actually, the fix is even simpler: if you find a method that raises >>>> "invalid utf8 input detected", just browse to it with a class >>>> browser, >>>> and re-accept it :). >>>> >>>> With my previous mail, I was not implying that someone should fix >>>> it >>>> for me, I was merely asking for what could a quick solution be, >>>> because I was a bit lost (scared) :). Now, I am happy. Thanks for >>>> discussing it. >>>> >>>> Cheers, >>>> Doru >>>> >>>> On 23 May 2009, at 13:07, Tudor Girba wrote: >>>> >>>>> Hi, >>>>> >>>>> I attached here a DNU implementation I took from an older image. >>>>> After filing this one in, I can debug DNU problems. >>>>> >>>>> Cheers, >>>>> Doru >>>>> >>>>> <Object-doesNotUnderstand.st> >>>>> >>>>> >>>>> >>>>> On 23 May 2009, at 13:04, Stéphane Ducasse wrote: >>>>> >>>>>> I did the following >>>>>> >>>>>> (Object>>#doesNotUNderstand) getSourceFromFile and I get an >>>>>> invalid.... >>>>>> >>>>>> Now when I take another method >>>>>> >>>>>> (BalloonFontTest>>#testDefaultFont) I do not get problem. >>>>>> >>>>>> I will reread carefully the mails of nicolas to try to >>>>>> understand, >>>>>> I do not know if the fixes of yoh >>>>>> >>>>>> http://bugs.squeak.org/view.php?id=5996 >>>>>> is related. >>>>>> >>>>>> Nicolas >>>>>> >>>>>>>> {Object>>#doesNotUnderstand:. >>>>>>>> SystemNavigation>>#browseMethodsWhoseNamesContain:. >>>>>>>> Utilities class>>#changeStampPerSe. >>>>>>>> Utilities class>>#methodsWithInitials:} collect: [:e | (e >>>>>>>> getSourceFromFile select: [:s | s charCode > 127]) asArray >>>>>>>> collect: >>>>>>>> [:c | c charCode]] >>>>>> >>>>>> I cannot get that code running it break before with me. >>>>>> >>>>>> Stef >>>>>> >>>>>> _______________________________________________ >>>>>> Pharo-project mailing list >>>>>> Pharo-project@lists.gforge.inria.fr >>>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project >>>>> >>>>> -- >>>>> www.tudorgirba.com >>>>> >>>>> "Not knowing how to do something is not an argument for how it >>>>> cannot be done." >>>>> >>>>> _______________________________________________ >>>>> Pharo-project mailing list >>>>> Pharo-project@lists.gforge.inria.fr >>>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo- >>>>> project >>>> >>>> -- >>>> www.tudorgirba.com >>>> >>>> "Problem solving efficiency grows with the abstractness level of >>>> problem understanding." >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Pharo-project mailing list >>>> Pharo-project@lists.gforge.inria.fr >>>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project >>>> >>> >>> >>> _______________________________________________ >>> Pharo-project mailing list >>> Pharo-project@lists.gforge.inria.fr >>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project >>> >> > > _______________________________________________ > Pharo-project mailing list > Pharo-project@lists.gforge.inria.fr > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project > _______________________________________________ Pharo-project mailing list Pharo-project@lists.gforge.inria.fr http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project