Re: [Pharo-project] Invalid utf8 input detected: now what?

2010-07-25 Thread Philippe Marschall
On 23.07.2010 15:15, Schwab,Wilhelm K wrote:
 No dialogs, please :)  Actually, it would be fine if there were a different 
 stream class or simply a different method/state (encoding =#userInteraction 
 or something??) that is understood to negotiate such details with the user. 
  In general, exception is the correct way to handle this: the stream knows 
 what is wrong; the application will know what to make of it.  If the encoding 
 can be detected automatically, that would be great.

There is no way to do this. The only thing that can be determined is
that something is not utf-8. The stream did that reliably. But you said
you already know the encoding, so just set in on the stream and it
should work.

 Firefox tells me that the encoding is ISO-8859-1; 

ISO-8859-1 and ISO-8859-15 are not the same.

Cheers
Philippe


___
Pharo-project mailing list
Pharo-project@lists.gforge.inria.fr
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


Re: [Pharo-project] Invalid utf8 input detected: now what?

2010-07-25 Thread Schwab,Wilhelm K
I'm ok with calling this works as intended if the encoding experts are.  
Since I am *not* an expert on encoding, I ran it up the flag pole.





From: pharo-project-boun...@lists.gforge.inria.fr 
[pharo-project-boun...@lists.gforge.inria.fr] On Behalf Of Philippe Marschall 
[kus...@gmx.net]
Sent: Sunday, July 25, 2010 3:57 AM
To: pharo-project@lists.gforge.inria.fr
Subject: Re: [Pharo-project] Invalid utf8 input detected: now what?

On 23.07.2010 15:15, Schwab,Wilhelm K wrote:
 No dialogs, please :)  Actually, it would be fine if there were a different 
 stream class or simply a different method/state (encoding =#userInteraction 
 or something??) that is understood to negotiate such details with the user. 
  In general, exception is the correct way to handle this: the stream knows 
 what is wrong; the application will know what to make of it.  If the encoding 
 can be detected automatically, that would be great.

There is no way to do this. The only thing that can be determined is
that something is not utf-8. The stream did that reliably. But you said
you already know the encoding, so just set in on the stream and it
should work.

 Firefox tells me that the encoding is ISO-8859-1;

ISO-8859-1 and ISO-8859-15 are not the same.

Cheers
Philippe


___
Pharo-project mailing list
Pharo-project@lists.gforge.inria.fr
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

___
Pharo-project mailing list
Pharo-project@lists.gforge.inria.fr
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


Re: [Pharo-project] Invalid utf8 input detected: now what?

2010-07-24 Thread Henrik Johansen

On Jul 24, 2010, at 1:57 11PM, Schwab,Wilhelm K wrote:

 I agree that there is apparently not much of a problem.  However, I also 
 stand by no more dialogs unless they are in a clearly-identified 
 class/method/state that is known to interact with the user.  Squeak has *far* 
 too much forced and unexpected interaction, and we must not go back down that 
 road.

Which is why I asked initially whether you encountered this when using a tool 
(ie file browser etc.), or custom code.
For tools when you have no way to set one encoding which will be correct for 
all cases, it might be a better behaviour to open a dialogue where one can be 
selected instead of raising a DNU, if a GUI is present.

 
 Those things said, there might be room to grow, as someone suggested the 
 possibility of automatically detecting the coding.

Then they were wrong.

Cheers,
Henry
___
Pharo-project mailing list
Pharo-project@lists.gforge.inria.fr
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


Re: [Pharo-project] Invalid utf8 input detected: now what?

2010-07-23 Thread Stéphane Ducasse

On Jul 23, 2010, at 4:38 AM, Yanni Chiu wrote:

 Schwab,Wilhelm K wrote:
 I got an error (on Ubuntu 9.10) trying open an old text file that I
 created on Windows some time ago.  The encoding (if gedit's save-as
 dialog can be trusted??) is Western ISO-8859-15; resaving as utf8
 lets me read it.
 
 You could try viewing the original file in a web browser. Try different 
 encodings until the stuff looks right. Then you might have a better idea of 
 whether you really have a file in ISO-8859-15.
 
 You could also view your converted UTF-8 file in a web browser too, and 
 compare the two renderings.
 
 If this checks out, then maybe it's a Pharo issue.

please report and if possible with a test so that we can fix it.
 
 
 ___
 Pharo-project mailing list
 Pharo-project@lists.gforge.inria.fr
 http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


___
Pharo-project mailing list
Pharo-project@lists.gforge.inria.fr
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


Re: [Pharo-project] Invalid utf8 input detected: now what?

2010-07-23 Thread Philippe Marschall
On 07/23/2010 04:09 AM, Schwab,Wilhelm K wrote:
 Hello all,
 
 I got an error (on Ubuntu 9.10) trying open an old text file that I created 
 on Windows some time ago.  The encoding (if gedit's save-as dialog can be 
 trusted??) is Western ISO-8859-15; resaving as utf8 lets me read it.
 
 So, is Pharo working by design?  Did I do the correct/only thing needed to 
 read the file?

You need to pass the encoding to the file stream.

Cheers
Philippe


___
Pharo-project mailing list
Pharo-project@lists.gforge.inria.fr
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


Re: [Pharo-project] Invalid utf8 input detected: now what?

2010-07-23 Thread Schwab,Wilhelm K
No dialogs, please :)  Actually, it would be fine if there were a different 
stream class or simply a different method/state (encoding =#userInteraction or 
something??) that is understood to negotiate such details with the user.  In 
general, exception is the correct way to handle this: the stream knows what 
is wrong; the application will know what to make of it.  If the encoding can be 
detected automatically, that would be great.

Firefox tells me that the encoding is ISO-8859-1; I am not leaving off the 5, 
Firefox and gedit report it differently.  In fairness to gedit, I am reporting 
the encodings listed in its save-as dialog.  Unfortunately the offending file 
contains specifications that are not mine.  I have seen pieces of it published 
elsewhere (quite recently in fact) but will need to do some checking on the 
licensing.  I might be able to excerpt the file and end up with the same 
behavior.

Bill



From: pharo-project-boun...@lists.gforge.inria.fr 
[pharo-project-boun...@lists.gforge.inria.fr] On Behalf Of Henrik Johansen 
[henrik.s.johan...@veloxit.no]
Sent: Friday, July 23, 2010 7:39 AM
To: Pharo-project@lists.gforge.inria.fr
Subject: Re: [Pharo-project] Invalid utf8 input detected: now what?

On Jul 23, 2010, at 4:09 30AM, Schwab,Wilhelm K wrote:

 Hello all,

 I got an error (on Ubuntu 9.10) trying open an old text file that I created 
 on Windows some time ago.  The encoding (if gedit's save-as dialog can be 
 trusted??) is Western ISO-8859-15; resaving as utf8 lets me read it.

 So, is Pharo working by design?  Did I do the correct/only thing needed to 
 read the file?  What should I be asking?  Is there anything I can do to turn 
 this into a useful test/debugging example?

 Bill


This is not an error per se, seeing as the encoding is not utf8 :)

If the import was done from some tool instead of in your code (in which case 
you'd set the encoding of the file stream), a nicer *behavior* might be for the 
UI Manager to catch encoding errors when trying to read a file, and offer up a 
dialogue with a list of encodings which the file *can* be read as, along with a 
preview window of what the text would look like with the selected encoding, 
like some word processors do.

Cheers,
Henry
___
Pharo-project mailing list
Pharo-project@lists.gforge.inria.fr
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

___
Pharo-project mailing list
Pharo-project@lists.gforge.inria.fr
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


Re: [Pharo-project] Invalid utf8 input detected: now what?

2010-07-23 Thread Henrik Johansen


Den 23. juli 2010 kl. 15:15 skrev Schwab,Wilhelm K bsch...@anest.ufl.edu:

 No dialogs, please :)  Actually, it would be fine if there were a different 
 stream class or simply a different method/state (encoding =#userInteraction 
 or something??) that is understood to negotiate such details with the user. 
  
I have no idea what you are suggesting...

 In general, exception is the correct way to handle this: the stream knows 
 what is wrong; the application will know what to make of it.  If the encoding 
 can be detected automatically, that would be great.
 
Then I fail to see what the problem is.
You got an error stating it was not UTF8, which implies a choice of the correct 
encoding needs to be done by the application. (by setting the streams encoding 
to something else)
As you noticed with gedit/firefox, any automatic detection is at best an 
educated guess, and can not be relied upon to make the correct choice.

Cheers,
Henry
___
Pharo-project mailing list
Pharo-project@lists.gforge.inria.fr
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


Re: [Pharo-project] Invalid utf8 input detected: now what?

2010-07-22 Thread Yanni Chiu

Schwab,Wilhelm K wrote:


I got an error (on Ubuntu 9.10) trying open an old text file that I
created on Windows some time ago.  The encoding (if gedit's save-as
dialog can be trusted??) is Western ISO-8859-15; resaving as utf8
lets me read it.


You could try viewing the original file in a web browser. Try different 
encodings until the stuff looks right. Then you might have a better idea 
of whether you really have a file in ISO-8859-15.


You could also view your converted UTF-8 file in a web browser too, and 
compare the two renderings.


If this checks out, then maybe it's a Pharo issue.


___
Pharo-project mailing list
Pharo-project@lists.gforge.inria.fr
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


Re: [Pharo-project] invalid utf8 input detected

2009-05-23 Thread Stéphane Ducasse
I did the following

(Object#doesNotUNderstand) getSourceFromFile and I get an invalid

Now when I take another method

(BalloonFontTest#testDefaultFont) I do not get problem.

I will reread carefully the mails of nicolas to try to understand,
I do not know if the fixes of yoh

http://bugs.squeak.org/view.php?id=5996
is related.

Nicolas

 {Object#doesNotUnderstand:.
 SystemNavigation#browseMethodsWhoseNamesContain:.
 Utilities class#changeStampPerSe.
 Utilities class#methodsWithInitials:} collect: [:e | (e
 getSourceFromFile select: [:s | s charCode  127]) asArray collect:
 [:c | c charCode]]

I cannot get that code running it break before with me.

Stef

___
Pharo-project mailing list
Pharo-project@lists.gforge.inria.fr
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


Re: [Pharo-project] invalid utf8 input detected

2009-05-23 Thread Tudor Girba

Hi,

I attached here a DNU implementation I took from an older image. After  
filing this one in, I can debug DNU problems.


Cheers,
Doru



Object-doesNotUnderstand.st
Description: Binary data





On 23 May 2009, at 13:04, Stéphane Ducasse wrote:


I did the following

(Object#doesNotUNderstand) getSourceFromFile and I get an  
invalid


Now when I take another method

(BalloonFontTest#testDefaultFont) I do not get problem.

I will reread carefully the mails of nicolas to try to understand,
I do not know if the fixes of yoh

http://bugs.squeak.org/view.php?id=5996
is related.

Nicolas


{Object#doesNotUnderstand:.
SystemNavigation#browseMethodsWhoseNamesContain:.
Utilities class#changeStampPerSe.
Utilities class#methodsWithInitials:} collect: [:e | (e
getSourceFromFile select: [:s | s charCode  127]) asArray collect:
[:c | c charCode]]


I cannot get that code running it break before with me.

Stef

___
Pharo-project mailing list
Pharo-project@lists.gforge.inria.fr
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


--
www.tudorgirba.com

Not knowing how to do something is not an argument for how it cannot  
be done.


___
Pharo-project mailing list
Pharo-project@lists.gforge.inria.fr
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Re: [Pharo-project] invalid utf8 input detected

2009-05-23 Thread Tudor Girba
Actually, the fix is even simpler: if you find a method that raises  
invalid utf8 input detected, just browse to it with a class browser,  
and re-accept it :).

With my previous mail, I was not implying that someone should fix it  
for me, I was merely asking for what could a quick solution be,  
because I was a bit lost (scared) :). Now, I am happy. Thanks for  
discussing it.

Cheers,
Doru

On 23 May 2009, at 13:07, Tudor Girba wrote:

 Hi,

 I attached here a DNU implementation I took from an older image.  
 After filing this one in, I can debug DNU problems.

 Cheers,
 Doru

 Object-doesNotUnderstand.st



 On 23 May 2009, at 13:04, Stéphane Ducasse wrote:

 I did the following

 (Object#doesNotUNderstand) getSourceFromFile and I get an  
 invalid

 Now when I take another method

 (BalloonFontTest#testDefaultFont) I do not get problem.

 I will reread carefully the mails of nicolas to try to understand,
 I do not know if the fixes of yoh

  http://bugs.squeak.org/view.php?id=5996
 is related.

 Nicolas

 {Object#doesNotUnderstand:.
 SystemNavigation#browseMethodsWhoseNamesContain:.
 Utilities class#changeStampPerSe.
 Utilities class#methodsWithInitials:} collect: [:e | (e
 getSourceFromFile select: [:s | s charCode  127]) asArray collect:
 [:c | c charCode]]

 I cannot get that code running it break before with me.

 Stef

 ___
 Pharo-project mailing list
 Pharo-project@lists.gforge.inria.fr
 http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

 --
 www.tudorgirba.com

 Not knowing how to do something is not an argument for how it  
 cannot be done.

 ___
 Pharo-project mailing list
 Pharo-project@lists.gforge.inria.fr
 http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

--
www.tudorgirba.com

Problem solving efficiency grows with the abstractness level of  
problem understanding.




___
Pharo-project mailing list
Pharo-project@lists.gforge.inria.fr
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


Re: [Pharo-project] invalid utf8 input detected

2009-05-23 Thread Stéphane Ducasse
No problem I never interpreted it like that.
Me too I want a system that is working

Adrian I will publish a fix for DNU now
and I will try later to check the fixes proposed by yoshiki

stef

On May 23, 2009, at 1:29 PM, Tudor Girba wrote:

 Actually, the fix is even simpler: if you find a method that raises
 invalid utf8 input detected, just browse to it with a class browser,
 and re-accept it :).

 With my previous mail, I was not implying that someone should fix it
 for me, I was merely asking for what could a quick solution be,
 because I was a bit lost (scared) :). Now, I am happy. Thanks for
 discussing it.

 Cheers,
 Doru

 On 23 May 2009, at 13:07, Tudor Girba wrote:

 Hi,

 I attached here a DNU implementation I took from an older image.
 After filing this one in, I can debug DNU problems.

 Cheers,
 Doru

 Object-doesNotUnderstand.st



 On 23 May 2009, at 13:04, Stéphane Ducasse wrote:

 I did the following

 (Object#doesNotUNderstand) getSourceFromFile and I get an
 invalid

 Now when I take another method

 (BalloonFontTest#testDefaultFont) I do not get problem.

 I will reread carefully the mails of nicolas to try to understand,
 I do not know if the fixes of yoh

 http://bugs.squeak.org/view.php?id=5996
 is related.

 Nicolas

 {Object#doesNotUnderstand:.
 SystemNavigation#browseMethodsWhoseNamesContain:.
 Utilities class#changeStampPerSe.
 Utilities class#methodsWithInitials:} collect: [:e | (e
 getSourceFromFile select: [:s | s charCode  127]) asArray  
 collect:
 [:c | c charCode]]

 I cannot get that code running it break before with me.

 Stef

 ___
 Pharo-project mailing list
 Pharo-project@lists.gforge.inria.fr
 http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

 --
 www.tudorgirba.com

 Not knowing how to do something is not an argument for how it
 cannot be done.

 ___
 Pharo-project mailing list
 Pharo-project@lists.gforge.inria.fr
 http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

 --
 www.tudorgirba.com

 Problem solving efficiency grows with the abstractness level of
 problem understanding.




 ___
 Pharo-project mailing list
 Pharo-project@lists.gforge.inria.fr
 http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project



___
Pharo-project mailing list
Pharo-project@lists.gforge.inria.fr
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


Re: [Pharo-project] invalid utf8 input detected

2009-05-23 Thread Nicolas Cellier
What happened exactly is very hard to trace because these FileStream
are a can of worms...
Here are some of my perigrinations:

FIRST POSSIBLE TRACK:

All methods were changed in 10305.
Monticello snapshot/source.st is not UTF-8.
If the file is opened UTF-8, then we get decompiledCode, I don't know why yet...
But the changes still go into the change log in correct UTF-8 form, so
that's just another bug, but not the real source of the problem.
For getting some worms out of the can just browse inst var defs of
converter in MultiByteFileStream:
The accessor #converter initialize converter with TextConverter
defaultSystemConverter which depends on LanguageEnvironment.
That is a Latin1TextConverter in my latin image.
Unless #reset is called first, in which case it will initialize with a
UTF8TextConverter.
Yes, but open: fileName forWrite: writeMode, does the job too with a
UTF8TextConverter.
You still follow? me neither.
A better behaved is #setConverterForCode that should let non UTF-8
.mcz work in UTF-8 environment, but not sure if called where
required...
I think Yoshiki changes are necessary only for writing source code
with character code  255.
This was not the case of incriminated methods.

SECOND POSSIBLE TRACK:

Everything going to the change log pass thru the MultiByteFileStream,
so how did non UTF-8 characters went in?
I tried to follow two other clues:
1) There are senders of #primWrite:from:startingAt:count: not
redefined in MultiByteFileStream...
  for example, using #next:putAll:startingAt: will bypass the converter.
2) using nextPutAll: with a ByteArray argument also does bypass the
converter (See MultiByteFileStream#nextPutAll:)
I did not find the senders (you really believe senders of nextPutAll:
can be analyzed?).
I tried to instrument code with Notification, but I'm unable to
reproduce the problem, so that was vain...

THIRD POSSIBLE TRACK:

http://gforge.inria.fr/frs/download.php/22283/Pharo0.1Core-10304cl.zip
has the invalid UTF-8 problem, just before 10305 changes that
introduced decompiled code...
So we might attack the problem with another code snippet:

(SystemNavigation default browseAllCallsOn: (Smalltalk associationAt:
#SourceFiles))...

Hmm, I might have a better clue now.
The problem might possibly come from the condenseChanges in update10298.
What happen in a condenseChanges?
Changes are copied to this file:

f := FileStream fileNamed: 'ST80.temp'.

So far, so good, because the concreteStream is a MultiByteFileStream.

But the end finishes with:

SourceFiles
at: 2
put: (StandardFileStream oldFileNamed: oldChanges name)

Waouh, no MultiByteFileStream here, so no more UTF-8.
But hey, that would be the inverse problem: reading UTF-8 text with
latin1 reader: I can't get an error doing this, only some strange
sequence of characters... (The UTF-8 encoding)...
Unless incriminated methods are further changed in #script376 or any
other method... In which case they are written in latin1 in the
changeLog...
Hmm... That could be the case eventually. We must restart update
process from 
http://gforge.inria.fr/frs/download.php/22167/Pharo0.1Core-10296cl-2.zip

One thing is sure, at next returnFromSnapshot, FileDirectory
classstartup will reopen changes UTF-8.
So saving the image will reopen UTF-8...

But wait... Maybe we get enough pieces of the puzzle:
Analyzing the Pharo0.1Core-10304cl.changes tells that Stephane applied
several updates before snapshoting the image. So if Kernel and
System-Support are changed between 10298 and 10304, then we get the
explanation:
- condense changes put all in the .changes in UTF-8 but reopen the
changes in latin1
- further updates up to 10304 write changes in latin1
- image snapshot reopen changes in UTF-8 and thus we get further
invalid UTF-8...

That's easy to reproduce. Stef, can you confirm?

That also explain why I did not get the problem at home: I update
early and always save my image after.
After that we still have to detect and clean while Monticello sources
are interpreted UTF-8 when they should not (FIRST TRACK) , and
eventually make source code go UTF-8 in Monticello, so that non latin
programmers can use their favourite language eventually...

Nicolas

2009/5/23 Stéphane Ducasse stephane.duca...@inria.fr:
 No problem I never interpreted it like that.
 Me too I want a system that is working

 Adrian I will publish a fix for DNU now
 and I will try later to check the fixes proposed by yoshiki

 stef

 On May 23, 2009, at 1:29 PM, Tudor Girba wrote:

 Actually, the fix is even simpler: if you find a method that raises
 invalid utf8 input detected, just browse to it with a class browser,
 and re-accept it :).

 With my previous mail, I was not implying that someone should fix it
 for me, I was merely asking for what could a quick solution be,
 because I was a bit lost (scared) :). Now, I am happy. Thanks for
 discussing it.

 Cheers,
 Doru

 On 23 May 2009, at 13:07, Tudor Girba wrote:

 Hi,

 I attached here a 

Re: [Pharo-project] invalid utf8 input detected

2009-05-23 Thread Stéphane Ducasse
HI nicolas

I was reading the changes of yoshiki I will integrate but indeed this  
is not for our case.
My reply below... I tried to follow :)

 What happened exactly is very hard to trace because these FileStream
 are a can of worms...
 Here are some of my perigrinations:

 FIRST POSSIBLE TRACK:

 All methods were changed in 10305.
 Monticello snapshot/source.st is not UTF-8.
 If the file is opened UTF-8, then we get decompiledCode, I don't  
 know why yet...
 But the changes still go into the change log in correct UTF-8 form, so
 that's just another bug, but not the real source of the problem.
 For getting some worms out of the can just browse inst var defs of
 converter in MultiByteFileStream:
 The accessor #converter initialize converter with TextConverter
 defaultSystemConverter which depends on LanguageEnvironment.
 That is a Latin1TextConverter in my latin image.
 Unless #reset is called first, in which case it will initialize with a
 UTF8TextConverter.
 Yes, but open: fileName forWrite: writeMode, does the job too with a
 UTF8TextConverter.
 You still follow? me neither.
 A better behaved is #setConverterForCode that should let non UTF-8
 .mcz work in UTF-8 environment, but not sure if called where
 required...
 I think Yoshiki changes are necessary only for writing source code
 with character code  255.
 This was not the case of incriminated methods.

 SECOND POSSIBLE TRACK:

 Everything going to the change log pass thru the MultiByteFileStream,
 so how did non UTF-8 characters went in?
 I tried to follow two other clues:
 1) There are senders of #primWrite:from:startingAt:count: not
 redefined in MultiByteFileStream...
  for example, using #next:putAll:startingAt: will bypass the  
 converter.
 2) using nextPutAll: with a ByteArray argument also does bypass the
 converter (See MultiByteFileStream#nextPutAll:)
 I did not find the senders (you really believe senders of nextPutAll:
 can be analyzed?).
 I tried to instrument code with Notification, but I'm unable to
 reproduce the problem, so that was vain...

 THIRD POSSIBLE TRACK:

 http://gforge.inria.fr/frs/download.php/22283/Pharo0.1Core-10304cl.zip
 has the invalid UTF-8 problem, just before 10305 changes that
 introduced decompiled code...
 So we might attack the problem with another code snippet:

 (SystemNavigation default browseAllCallsOn: (Smalltalk associationAt:
 #SourceFiles))...

 Hmm, I might have a better clue now.
 The problem might possibly come from the condenseChanges in  
 update10298.
 What happen in a condenseChanges?
 Changes are copied to this file:

 f := FileStream fileNamed: 'ST80.temp'.

 So far, so good, because the concreteStream is a MultiByteFileStream.

 But the end finishes with:

   SourceFiles
   at: 2
   put: (StandardFileStream oldFileNamed: oldChanges name)

 Waouh, no MultiByteFileStream here, so no more UTF-8.
 But hey, that would be the inverse problem: reading UTF-8 text with
 latin1 reader: I can't get an error doing this, only some strange
 sequence of characters... (The UTF-8 encoding)...
 Unless incriminated methods are further changed in #script376 or any
 other method... In which case they are written in latin1 in the
 changeLog...
 Hmm... That could be the case eventually. We must restart update
 process from 
 http://gforge.inria.fr/frs/download.php/22167/Pharo0.1Core-10296cl-2.zip

 One thing is sure, at next returnFromSnapshot, FileDirectory
 classstartup will reopen changes UTF-8.
 So saving the image will reopen UTF-8...

 But wait... Maybe we get enough pieces of the puzzle:
 Analyzing the Pharo0.1Core-10304cl.changes tells that Stephane applied
 several updates before snapshoting the image. So if Kernel and
 System-Support are changed between 10298 and 10304, then we get the
 explanation:
 - condense changes put all in the .changes in UTF-8 but reopen the
 changes in latin1
 - further updates up to 10304 write changes in latin1
 - image snapshot reopen changes in UTF-8 and thus we get further
 invalid UTF-8...

 That's easy to reproduce. Stef, can you confirm?

how do you want me to confirm?
That I redo the image. What we can do is change the update method to  
block the update at a certain number.

 That also explain why I did not get the problem at home: I update
 early and always save my image after.
 After that we still have to detect and clean while Monticello sources
 are interpreted UTF-8 when they should not (FIRST TRACK) , and
 eventually make source code go UTF-8 in Monticello, so that non latin
 programmers can use their favourite language eventually...

 Nicolas

 2009/5/23 Stéphane Ducasse stephane.duca...@inria.fr:
 No problem I never interpreted it like that.
 Me too I want a system that is working

 Adrian I will publish a fix for DNU now
 and I will try later to check the fixes proposed by yoshiki

 stef

 On May 23, 2009, at 1:29 PM, Tudor Girba wrote:

 Actually, the fix is even simpler: if you find a method that raises
 invalid utf8 input 

Re: [Pharo-project] invalid utf8 input detected

2009-05-23 Thread Nicolas Cellier
I confirm the scenario:
1) update10298 condenseChanges that let (SourceFiles at: 2) class =
StandardFileStream
   This is the seed of further problems, because further changes will
be encoded in latin1 (or MacRoman I don't really wnt to know)
2) update10302 changes the methods with non ASCII characters
3) Stef save the image after update10304, that does reopen
(SourceFiles at: 2) in UTF-8, but that's too late, the worm is in the
apple.

If you save the image just after the condenseChanges, no problem
because (SourceFiles at: 2) is opened in Latin1 AFTER all the changes
have gotten into it, and reopened UTF-8 before any changes got into
it.
We must track undue usage of StandardFileStream such as #condenseChanges.

2009/5/23 Nicolas Cellier nicolas.cellier.aka.n...@gmail.com:
 What happened exactly is very hard to trace because these FileStream
 are a can of worms...
 Here are some of my perigrinations:

 FIRST POSSIBLE TRACK:

 All methods were changed in 10305.
 Monticello snapshot/source.st is not UTF-8.
 If the file is opened UTF-8, then we get decompiledCode, I don't know why 
 yet...
 But the changes still go into the change log in correct UTF-8 form, so
 that's just another bug, but not the real source of the problem.
 For getting some worms out of the can just browse inst var defs of
 converter in MultiByteFileStream:
 The accessor #converter initialize converter with TextConverter
 defaultSystemConverter which depends on LanguageEnvironment.
 That is a Latin1TextConverter in my latin image.
 Unless #reset is called first, in which case it will initialize with a
 UTF8TextConverter.
 Yes, but open: fileName forWrite: writeMode, does the job too with a
 UTF8TextConverter.
 You still follow? me neither.
 A better behaved is #setConverterForCode that should let non UTF-8
 .mcz work in UTF-8 environment, but not sure if called where
 required...
 I think Yoshiki changes are necessary only for writing source code
 with character code  255.
 This was not the case of incriminated methods.

 SECOND POSSIBLE TRACK:

 Everything going to the change log pass thru the MultiByteFileStream,
 so how did non UTF-8 characters went in?
 I tried to follow two other clues:
 1) There are senders of #primWrite:from:startingAt:count: not
 redefined in MultiByteFileStream...
  for example, using #next:putAll:startingAt: will bypass the converter.
 2) using nextPutAll: with a ByteArray argument also does bypass the
 converter (See MultiByteFileStream#nextPutAll:)
 I did not find the senders (you really believe senders of nextPutAll:
 can be analyzed?).
 I tried to instrument code with Notification, but I'm unable to
 reproduce the problem, so that was vain...

 THIRD POSSIBLE TRACK:

 http://gforge.inria.fr/frs/download.php/22283/Pharo0.1Core-10304cl.zip
 has the invalid UTF-8 problem, just before 10305 changes that
 introduced decompiled code...
 So we might attack the problem with another code snippet:

 (SystemNavigation default browseAllCallsOn: (Smalltalk associationAt:
 #SourceFiles))...

 Hmm, I might have a better clue now.
 The problem might possibly come from the condenseChanges in update10298.
 What happen in a condenseChanges?
 Changes are copied to this file:

 f := FileStream fileNamed: 'ST80.temp'.

 So far, so good, because the concreteStream is a MultiByteFileStream.

 But the end finishes with:

SourceFiles
at: 2
put: (StandardFileStream oldFileNamed: oldChanges name)

 Waouh, no MultiByteFileStream here, so no more UTF-8.
 But hey, that would be the inverse problem: reading UTF-8 text with
 latin1 reader: I can't get an error doing this, only some strange
 sequence of characters... (The UTF-8 encoding)...
 Unless incriminated methods are further changed in #script376 or any
 other method... In which case they are written in latin1 in the
 changeLog...
 Hmm... That could be the case eventually. We must restart update
 process from 
 http://gforge.inria.fr/frs/download.php/22167/Pharo0.1Core-10296cl-2.zip

 One thing is sure, at next returnFromSnapshot, FileDirectory
 classstartup will reopen changes UTF-8.
 So saving the image will reopen UTF-8...

 But wait... Maybe we get enough pieces of the puzzle:
 Analyzing the Pharo0.1Core-10304cl.changes tells that Stephane applied
 several updates before snapshoting the image. So if Kernel and
 System-Support are changed between 10298 and 10304, then we get the
 explanation:
 - condense changes put all in the .changes in UTF-8 but reopen the
 changes in latin1
 - further updates up to 10304 write changes in latin1
 - image snapshot reopen changes in UTF-8 and thus we get further
 invalid UTF-8...

 That's easy to reproduce. Stef, can you confirm?

 That also explain why I did not get the problem at home: I update
 early and always save my image after.
 After that we still have to detect and clean while Monticello sources
 are interpreted UTF-8 when they should not (FIRST TRACK) , and
 eventually make source code go UTF-8 in 

Re: [Pharo-project] invalid utf8 input detected

2009-05-23 Thread Stéphane Ducasse

On May 23, 2009, at 7:57 PM, Nicolas Cellier wrote:

 I confirm the scenario:
 1) update10298 condenseChanges that let (SourceFiles at: 2) class =
 StandardFileStream
   This is the seed of further problems, because further changes will
 be encoded in latin1 (or MacRoman I don't really wnt to know)
 2) update10302 changes the methods with non ASCII characters
 3) Stef save the image after update10304, that does reopen
 (SourceFiles at: 2) in UTF-8, but that's too late, the worm is in the
 apple.

 If you save the image just after the condenseChanges, no problem
 because (SourceFiles at: 2) is opened in Latin1 AFTER all the changes
 have gotten into it, and reopened UTF-8 before any changes got into
 it.
 We must track undue usage of StandardFileStream such as  
 #condenseChanges.


Ok now we cannot really rollback the changes and I fixed the methods
that were leading to invalid UTF. But it means that we should check  
the StandardFileStream
usage.
Im doing some experiences with umejava code

Stef


 2009/5/23 Nicolas Cellier nicolas.cellier.aka.n...@gmail.com:
 What happened exactly is very hard to trace because these FileStream
 are a can of worms...
 Here are some of my perigrinations:

 FIRST POSSIBLE TRACK:

 All methods were changed in 10305.
 Monticello snapshot/source.st is not UTF-8.
 If the file is opened UTF-8, then we get decompiledCode, I don't  
 know why yet...
 But the changes still go into the change log in correct UTF-8 form,  
 so
 that's just another bug, but not the real source of the problem.
 For getting some worms out of the can just browse inst var defs of
 converter in MultiByteFileStream:
 The accessor #converter initialize converter with TextConverter
 defaultSystemConverter which depends on LanguageEnvironment.
 That is a Latin1TextConverter in my latin image.
 Unless #reset is called first, in which case it will initialize  
 with a
 UTF8TextConverter.
 Yes, but open: fileName forWrite: writeMode, does the job too with a
 UTF8TextConverter.
 You still follow? me neither.
 A better behaved is #setConverterForCode that should let non UTF-8
 .mcz work in UTF-8 environment, but not sure if called where
 required...
 I think Yoshiki changes are necessary only for writing source code
 with character code  255.
 This was not the case of incriminated methods.

 SECOND POSSIBLE TRACK:

 Everything going to the change log pass thru the MultiByteFileStream,
 so how did non UTF-8 characters went in?
 I tried to follow two other clues:
 1) There are senders of #primWrite:from:startingAt:count: not
 redefined in MultiByteFileStream...
 for example, using #next:putAll:startingAt: will bypass the  
 converter.
 2) using nextPutAll: with a ByteArray argument also does bypass the
 converter (See MultiByteFileStream#nextPutAll:)
 I did not find the senders (you really believe senders of nextPutAll:
 can be analyzed?).
 I tried to instrument code with Notification, but I'm unable to
 reproduce the problem, so that was vain...

 THIRD POSSIBLE TRACK:

 http://gforge.inria.fr/frs/download.php/22283/ 
 Pharo0.1Core-10304cl.zip
 has the invalid UTF-8 problem, just before 10305 changes that
 introduced decompiled code...
 So we might attack the problem with another code snippet:

 (SystemNavigation default browseAllCallsOn: (Smalltalk associationAt:
 #SourceFiles))...

 Hmm, I might have a better clue now.
 The problem might possibly come from the condenseChanges in  
 update10298.
 What happen in a condenseChanges?
 Changes are copied to this file:

 f := FileStream fileNamed: 'ST80.temp'.

 So far, so good, because the concreteStream is a MultiByteFileStream.

 But the end finishes with:

   SourceFiles
   at: 2
   put: (StandardFileStream oldFileNamed: oldChanges name)

 Waouh, no MultiByteFileStream here, so no more UTF-8.
 But hey, that would be the inverse problem: reading UTF-8 text with
 latin1 reader: I can't get an error doing this, only some strange
 sequence of characters... (The UTF-8 encoding)...
 Unless incriminated methods are further changed in #script376 or any
 other method... In which case they are written in latin1 in the
 changeLog...
 Hmm... That could be the case eventually. We must restart update
 process from 
 http://gforge.inria.fr/frs/download.php/22167/Pharo0.1Core-10296cl-2.zip

 One thing is sure, at next returnFromSnapshot, FileDirectory
 classstartup will reopen changes UTF-8.
 So saving the image will reopen UTF-8...

 But wait... Maybe we get enough pieces of the puzzle:
 Analyzing the Pharo0.1Core-10304cl.changes tells that Stephane  
 applied
 several updates before snapshoting the image. So if Kernel and
 System-Support are changed between 10298 and 10304, then we get the
 explanation:
 - condense changes put all in the .changes in UTF-8 but reopen the
 changes in latin1
 - further updates up to 10304 write changes in latin1
 - image snapshot reopen changes in UTF-8 and thus we get further
 invalid UTF-8...

 That's easy to reproduce. 

Re: [Pharo-project] invalid utf8 input detected

2009-05-23 Thread Adrian Lienhard
Wow, great analysis, Nicolas!

I was trying to find the cause for several hours now. Your third track  
exactly matches my findings.

For example in Object#doesNotUnderstand: prior to the condensing,  
the source contained a non-ASCII character (UTF8 encoded as the two  
bytes: 192 160). This gets correctly transferred during the condensing  
into the new changes file. When you don't save the image (and hence  
have the standard stream without UTF8 encoder) what you see in the  
source is the character  (this is 192). That is, we suddenly have two  
characters, 192 and 160 where before there was just one. If you load a  
package, MC will compare methods and think this is a change. When  
loading the method from the MC file, the source is UTF8 encoded,  
producing a unicode character 160. When storing this source to the  
file (still without the encoder), it will just directly put 160 there.  
At this point we have lost the leading  byte 192. Next time we start  
or save the image and have the right encoder again, it will choke  
because 160 is an invalid first byte in UTF8.

I think it's save to fix the invalid methods by overriding their  
source. So we don't have to backtrack to version 10297.

Thanks,
Adrian


On May 23, 2009, at 19:57 , Nicolas Cellier wrote:

 I confirm the scenario:
 1) update10298 condenseChanges that let (SourceFiles at: 2) class =
 StandardFileStream
   This is the seed of further problems, because further changes will
 be encoded in latin1 (or MacRoman I don't really wnt to know)
 2) update10302 changes the methods with non ASCII characters
 3) Stef save the image after update10304, that does reopen
 (SourceFiles at: 2) in UTF-8, but that's too late, the worm is in the
 apple.

 If you save the image just after the condenseChanges, no problem
 because (SourceFiles at: 2) is opened in Latin1 AFTER all the changes
 have gotten into it, and reopened UTF-8 before any changes got into
 it.
 We must track undue usage of StandardFileStream such as  
 #condenseChanges.

 2009/5/23 Nicolas Cellier nicolas.cellier.aka.n...@gmail.com:
 What happened exactly is very hard to trace because these FileStream
 are a can of worms...
 Here are some of my perigrinations:

 FIRST POSSIBLE TRACK:

 All methods were changed in 10305.
 Monticello snapshot/source.st is not UTF-8.
 If the file is opened UTF-8, then we get decompiledCode, I don't  
 know why yet...
 But the changes still go into the change log in correct UTF-8 form,  
 so
 that's just another bug, but not the real source of the problem.
 For getting some worms out of the can just browse inst var defs of
 converter in MultiByteFileStream:
 The accessor #converter initialize converter with TextConverter
 defaultSystemConverter which depends on LanguageEnvironment.
 That is a Latin1TextConverter in my latin image.
 Unless #reset is called first, in which case it will initialize  
 with a
 UTF8TextConverter.
 Yes, but open: fileName forWrite: writeMode, does the job too with a
 UTF8TextConverter.
 You still follow? me neither.
 A better behaved is #setConverterForCode that should let non UTF-8
 .mcz work in UTF-8 environment, but not sure if called where
 required...
 I think Yoshiki changes are necessary only for writing source code
 with character code  255.
 This was not the case of incriminated methods.

 SECOND POSSIBLE TRACK:

 Everything going to the change log pass thru the MultiByteFileStream,
 so how did non UTF-8 characters went in?
 I tried to follow two other clues:
 1) There are senders of #primWrite:from:startingAt:count: not
 redefined in MultiByteFileStream...
 for example, using #next:putAll:startingAt: will bypass the  
 converter.
 2) using nextPutAll: with a ByteArray argument also does bypass the
 converter (See MultiByteFileStream#nextPutAll:)
 I did not find the senders (you really believe senders of nextPutAll:
 can be analyzed?).
 I tried to instrument code with Notification, but I'm unable to
 reproduce the problem, so that was vain...

 THIRD POSSIBLE TRACK:

 http://gforge.inria.fr/frs/download.php/22283/ 
 Pharo0.1Core-10304cl.zip
 has the invalid UTF-8 problem, just before 10305 changes that
 introduced decompiled code...
 So we might attack the problem with another code snippet:

 (SystemNavigation default browseAllCallsOn: (Smalltalk associationAt:
 #SourceFiles))...

 Hmm, I might have a better clue now.
 The problem might possibly come from the condenseChanges in  
 update10298.
 What happen in a condenseChanges?
 Changes are copied to this file:

 f := FileStream fileNamed: 'ST80.temp'.

 So far, so good, because the concreteStream is a MultiByteFileStream.

 But the end finishes with:

   SourceFiles
   at: 2
   put: (StandardFileStream oldFileNamed: oldChanges name)

 Waouh, no MultiByteFileStream here, so no more UTF-8.
 But hey, that would be the inverse problem: reading UTF-8 text with
 latin1 reader: I can't get an error doing this, only some strange
 sequence of 

Re: [Pharo-project] invalid utf8 input detected

2009-05-23 Thread Stéphane Ducasse
Excellent!
Thanks guys.
I'm preparing a lectures for torino and I will experiment with umejava  
mcz fixes.

Stef

On May 23, 2009, at 8:49 PM, Adrian Lienhard wrote:

 Wow, great analysis, Nicolas!

 I was trying to find the cause for several hours now. Your third track
 exactly matches my findings.

 For example in Object#doesNotUnderstand: prior to the condensing,
 the source contained a non-ASCII character (UTF8 encoded as the two
 bytes: 192 160). This gets correctly transferred during the condensing
 into the new changes file. When you don't save the image (and hence
 have the standard stream without UTF8 encoder) what you see in the
 source is the character  (this is 192). That is, we suddenly have two
 characters, 192 and 160 where before there was just one. If you load a
 package, MC will compare methods and think this is a change. When
 loading the method from the MC file, the source is UTF8 encoded,
 producing a unicode character 160. When storing this source to the
 file (still without the encoder), it will just directly put 160 there.
 At this point we have lost the leading  byte 192. Next time we start
 or save the image and have the right encoder again, it will choke
 because 160 is an invalid first byte in UTF8.

 I think it's save to fix the invalid methods by overriding their
 source. So we don't have to backtrack to version 10297.

 Thanks,
 Adrian


 On May 23, 2009, at 19:57 , Nicolas Cellier wrote:

 I confirm the scenario:
 1) update10298 condenseChanges that let (SourceFiles at: 2) class =
 StandardFileStream
  This is the seed of further problems, because further changes will
 be encoded in latin1 (or MacRoman I don't really wnt to know)
 2) update10302 changes the methods with non ASCII characters
 3) Stef save the image after update10304, that does reopen
 (SourceFiles at: 2) in UTF-8, but that's too late, the worm is in the
 apple.

 If you save the image just after the condenseChanges, no problem
 because (SourceFiles at: 2) is opened in Latin1 AFTER all the changes
 have gotten into it, and reopened UTF-8 before any changes got into
 it.
 We must track undue usage of StandardFileStream such as
 #condenseChanges.

 2009/5/23 Nicolas Cellier nicolas.cellier.aka.n...@gmail.com:
 What happened exactly is very hard to trace because these FileStream
 are a can of worms...
 Here are some of my perigrinations:

 FIRST POSSIBLE TRACK:

 All methods were changed in 10305.
 Monticello snapshot/source.st is not UTF-8.
 If the file is opened UTF-8, then we get decompiledCode, I don't
 know why yet...
 But the changes still go into the change log in correct UTF-8 form,
 so
 that's just another bug, but not the real source of the problem.
 For getting some worms out of the can just browse inst var defs of
 converter in MultiByteFileStream:
 The accessor #converter initialize converter with TextConverter
 defaultSystemConverter which depends on LanguageEnvironment.
 That is a Latin1TextConverter in my latin image.
 Unless #reset is called first, in which case it will initialize
 with a
 UTF8TextConverter.
 Yes, but open: fileName forWrite: writeMode, does the job too with a
 UTF8TextConverter.
 You still follow? me neither.
 A better behaved is #setConverterForCode that should let non UTF-8
 .mcz work in UTF-8 environment, but not sure if called where
 required...
 I think Yoshiki changes are necessary only for writing source code
 with character code  255.
 This was not the case of incriminated methods.

 SECOND POSSIBLE TRACK:

 Everything going to the change log pass thru the  
 MultiByteFileStream,
 so how did non UTF-8 characters went in?
 I tried to follow two other clues:
 1) There are senders of #primWrite:from:startingAt:count: not
 redefined in MultiByteFileStream...
 for example, using #next:putAll:startingAt: will bypass the
 converter.
 2) using nextPutAll: with a ByteArray argument also does bypass the
 converter (See MultiByteFileStream#nextPutAll:)
 I did not find the senders (you really believe senders of  
 nextPutAll:
 can be analyzed?).
 I tried to instrument code with Notification, but I'm unable to
 reproduce the problem, so that was vain...

 THIRD POSSIBLE TRACK:

 http://gforge.inria.fr/frs/download.php/22283/
 Pharo0.1Core-10304cl.zip
 has the invalid UTF-8 problem, just before 10305 changes that
 introduced decompiled code...
 So we might attack the problem with another code snippet:

 (SystemNavigation default browseAllCallsOn: (Smalltalk  
 associationAt:
 #SourceFiles))...

 Hmm, I might have a better clue now.
 The problem might possibly come from the condenseChanges in
 update10298.
 What happen in a condenseChanges?
 Changes are copied to this file:

 f := FileStream fileNamed: 'ST80.temp'.

 So far, so good, because the concreteStream is a  
 MultiByteFileStream.

 But the end finishes with:

  SourceFiles
  at: 2
  put: (StandardFileStream oldFileNamed: oldChanges name)

 Waouh, no MultiByteFileStream here, so no more UTF-8.
 

Re: [Pharo-project] invalid utf8 input detected

2009-05-17 Thread Stéphane Ducasse
yes same here.

On May 17, 2009, at 2:10 AM, Tudor Girba wrote:

 Hi,

 Recently I encounter a strange error:
 - I sometimes get a debugger due to some problems in my code
 - when I try to investigate the trace, I get another debugger saying
 that Invalid utf8 input detected'

 This second debugger I can investigate, the previous not. It looks
 like something got messed up with the text conversion of the sources.

 I am working on 10306 using the 4.1.1b2 VM on Mac. The code I am
 working on is loaded from squeaksource (Moose, Glamour, Mondrian).

 Anyone can confirm this problem?

 Cheers,
 Doru


 ERROR REPORT

 '17 May 2009 2:05:50 am

 VM: Mac OS - intel - 1056 - Squeak3.8.1 of ''28 Aug 2006'' [latest
 update: #6747] Squeak VM 4.1.1b2
 Image: Pharo0.1 [Latest update: #10306]

 SecurityManager state:
 Restricted: false
 FileAccess: true
 SocketAccess: true
 Working Dir /Users/girba/Work/Code/squeakingmoose
 Trusted Dir /foobar/tooBar/forSqueak/bogus
 Untrusted Dir /Users/girba/Library/Preferences/Squeak/Internet/My  
 Squeak

 UTF8TextConverter(Object)error:
   Receiver: an UTF8TextConverter
   Arguments and temporary variables:
   aString:''Invalid utf8 input detected''
   Receiver''s instance variables:
 an UTF8TextConverter

 UTF8TextConvertererrorMalformedInput
   Receiver: an UTF8TextConverter
   Arguments and temporary variables:

   Receiver''s instance variables:
 an UTF8TextConverter

 UTF8TextConverternextFromStream:
   Receiver: an UTF8TextConverter
   Arguments and temporary variables:
   aStream:MultiByteFileStream: ''/Users/girba/Work/Code/
 squeakingmoose/moose.chan...etc...
   character1: $
   value1: 160
   character2: Character tab
   value2: 9
   unicode:nil
   character3: Character tab
   value3: 9
   character4: nil
   value4: nil
   Receiver''s instance variables:
 an UTF8TextConverter

 MultiByteFileStreamnext
   Receiver: MultiByteFileStream: ''/Users/girba/Work/Code/
 squeakingmoose/moose.changes''
   Arguments and temporary variables:
   char:   nil
   secondChar: nil
   state:  nil
   Receiver''s instance variables:


 MultiByteFileStream(PositionableStream)nextChunk
   Receiver: MultiByteFileStream: ''/Users/girba/Work/Code/
 squeakingmoose/moose.changes''
   Arguments and temporary variables:
   terminator: $!
   out:a WriteStream ''doesNotUnderstand: aMessage
Handle the fact that there ...etc...
   ch: Character cr
   Receiver''s instance variables:


 MultiByteFileStream(PositionableStream)nextChunkText
   Receiver: MultiByteFileStream: ''/Users/girba/Work/Code/
 squeakingmoose/moose.changes''
   Arguments and temporary variables:
   string: nil
   runsRaw:nil
   strm:   nil
   runs:   nil
   peek:   nil
   pos:nil
   Receiver''s instance variables:


 [] in RemoteStringtext
   Receiver: a RemoteString
   Arguments and temporary variables:
   theFile:MultiByteFileStream: ''/Users/girba/Work/Code/
 squeakingmoose/moose.chan...etc...
   Receiver''s instance variables:
   sourceFileNumber:   2
   filePositionHi: 10007336

 BlockClosureensure:
   Receiver: [closure] in RemoteStringtext
   Arguments and temporary variables:
   aBlock: [closure] in RemoteStringtext
   returnValue:nil
   b:  nil
   Receiver''s instance variables:
   outerContext:   RemoteStringtext
   startpc:72
   numArgs:0

 RemoteStringtext
   Receiver: a RemoteString
   Arguments and temporary variables:
   theFile:MultiByteFileStream: ''/Users/girba/Work/Code/
 squeakingmoose/moose.chan...etc...
   Receiver''s instance variables:
   sourceFileNumber:   2
   filePositionHi: 10007336

 CompiledMethodgetSourceFromFile
   Receiver: a CompiledMethod (838)
   Arguments and temporary variables:
   position:   10007336
   Receiver''s instance variables:
 a CompiledMethod (838)

 CompiledMethodmethodNode
   Receiver: a CompiledMethod (838)
   Arguments and temporary variables:
   aClass: Object
   source: nil
   Receiver''s instance variables:
 a CompiledMethod (838)

 [] in DebuggerMethodMap classforMethod:
   Receiver: DebuggerMethodMap
   Arguments and temporary variables:
   aMethod:a CompiledMethod (838)
   Receiver''s instance variables:
   superclass: Object
   

Re: [Pharo-project] invalid utf8 input detected

2009-05-17 Thread Nicolas Cellier
One solution would be to use getSource rather than getSourceFromFile.
However, with following code I detected no problem in my pharo-core
copy (10281 updated to 10306)

| problems total |
problems := OrderedCollection new.
total := 0.
SystemNavigation default allBehaviorsDo: [:cl | total := total + 1].
'Searching UTF-8 Problems...'
displayProgressAt: Sensor cursorPoint
from: 0 to: total
during:
[:bar | | count |
count := 0.
SystemNavigation default allBehaviorsDo: [:cl |
bar value: (count := count + 1).
cl selectors do: [:sel |
[(cl compiledMethodAt: sel) getSourceFromFile] 
ifError: [
var value: 'last problem found ' , cl 
name , '#' , sel.
problems add: cl-sel.
^problems


2009/5/17 Stéphane Ducasse stephane.duca...@inria.fr:
 yes same here.

 On May 17, 2009, at 2:10 AM, Tudor Girba wrote:

 Hi,

 Recently I encounter a strange error:
 - I sometimes get a debugger due to some problems in my code
 - when I try to investigate the trace, I get another debugger saying
 that Invalid utf8 input detected'

 This second debugger I can investigate, the previous not. It looks
 like something got messed up with the text conversion of the sources.

 I am working on 10306 using the 4.1.1b2 VM on Mac. The code I am
 working on is loaded from squeaksource (Moose, Glamour, Mondrian).

 Anyone can confirm this problem?

 Cheers,
 Doru


 ERROR REPORT

 '17 May 2009 2:05:50 am

 VM: Mac OS - intel - 1056 - Squeak3.8.1 of ''28 Aug 2006'' [latest
 update: #6747] Squeak VM 4.1.1b2
 Image: Pharo0.1 [Latest update: #10306]

 SecurityManager state:
 Restricted: false
 FileAccess: true
 SocketAccess: true
 Working Dir /Users/girba/Work/Code/squeakingmoose
 Trusted Dir /foobar/tooBar/forSqueak/bogus
 Untrusted Dir /Users/girba/Library/Preferences/Squeak/Internet/My
 Squeak

 UTF8TextConverter(Object)error:
   Receiver: an UTF8TextConverter
   Arguments and temporary variables:
   aString:''Invalid utf8 input detected''
   Receiver''s instance variables:
 an UTF8TextConverter

 UTF8TextConvertererrorMalformedInput
   Receiver: an UTF8TextConverter
   Arguments and temporary variables:

   Receiver''s instance variables:
 an UTF8TextConverter

 UTF8TextConverternextFromStream:
   Receiver: an UTF8TextConverter
   Arguments and temporary variables:
   aStream:MultiByteFileStream: ''/Users/girba/Work/Code/
 squeakingmoose/moose.chan...etc...
   character1: $
   value1: 160
   character2: Character tab
   value2: 9
   unicode:nil
   character3: Character tab
   value3: 9
   character4: nil
   value4: nil
   Receiver''s instance variables:
 an UTF8TextConverter

 MultiByteFileStreamnext
   Receiver: MultiByteFileStream: ''/Users/girba/Work/Code/
 squeakingmoose/moose.changes''
   Arguments and temporary variables:
   char:   nil
   secondChar: nil
   state:  nil
   Receiver''s instance variables:


 MultiByteFileStream(PositionableStream)nextChunk
   Receiver: MultiByteFileStream: ''/Users/girba/Work/Code/
 squeakingmoose/moose.changes''
   Arguments and temporary variables:
   terminator: $!
   out:a WriteStream ''doesNotUnderstand: aMessage
Handle the fact that there ...etc...
   ch: Character cr
   Receiver''s instance variables:


 MultiByteFileStream(PositionableStream)nextChunkText
   Receiver: MultiByteFileStream: ''/Users/girba/Work/Code/
 squeakingmoose/moose.changes''
   Arguments and temporary variables:
   string: nil
   runsRaw:nil
   strm:   nil
   runs:   nil
   peek:   nil
   pos:nil
   Receiver''s instance variables:


 [] in RemoteStringtext
   Receiver: a RemoteString
   Arguments and temporary variables:
   theFile:MultiByteFileStream: ''/Users/girba/Work/Code/
 squeakingmoose/moose.chan...etc...
   Receiver''s instance variables:
   sourceFileNumber:   2
   filePositionHi: 10007336

 BlockClosureensure:
   Receiver: [closure] in RemoteStringtext
   Arguments and temporary variables:
   aBlock: [closure] in RemoteStringtext
   returnValue:nil
   b:  nil
   Receiver''s instance variables:
   outerContext:   RemoteStringtext
   startpc:72
   numArgs:0

 RemoteStringtext
   Receiver: a RemoteString
   Arguments and temporary 

Re: [Pharo-project] invalid utf8 input detected

2009-05-17 Thread Stéphane Ducasse
Nicolas

when I run your script on the license looking for image
I got using 10306cl

I get the following error:



VM: Mac OS - intel - 1056 - Squeak3.8.1 of '28 Aug 2006' [latest  
update: #6747] Squeak VM 4.1.1b2
Image: Pharo0.1 [Latest update: #10306]

SecurityManager state:
Restricted: false
FileAccess: true
SocketAccess: true
Working Dir /Data/squeak4.0-relicenseTools/history
Trusted Dir /foobar/tooBar/forSqueak/bogus
Untrusted Dir /Users/ducasse/Library/Preferences/Squeak/Internet/My  
Squeak

UndefinedObject(Object)doesNotUnderstand: #value:
Receiver: nil
Arguments and temporary variables:
error during printing
Receiver's instance variables:
nil

[] in [] in [] in [] in UndefinedObjectDoIt
Receiver: nil
Arguments and temporary variables:
error during printing
Receiver's instance variables:
nil

BlockClosurevalueWithPossibleArgs:
Receiver: [closure] in [] in [] in [] in UndefinedObjectDoIt
Arguments and temporary variables:
anArray:an Array('Error: Invalid utf8 input detected' 
an  
UTF8TextConverter)
Receiver's instance variables:
outerContext:   [] in [] in [] in UndefinedObjectDoIt
startpc:183
numArgs:0

[] in BlockClosureifError:
Receiver: [closure] in [] in [] in [] in UndefinedObjectDoIt
Arguments and temporary variables:
errorHandlerBlock:  Error: Invalid utf8 input detected
ex: [closure] in [] in [] in [] in UndefinedObjectDoIt
Receiver's instance variables:
outerContext:   [] in [] in [] in UndefinedObjectDoIt
startpc:171
numArgs:0

BlockClosurevalueWithPossibleArgs:
Receiver: [closure] in BlockClosureifError:
Arguments and temporary variables:
anArray:an Array(Error: Invalid utf8 input detected)
Receiver's instance variables:
outerContext:   BlockClosureifError:
startpc:40
numArgs:1

[] in MethodContext(ContextPart)handleSignal:
Receiver: BlockClosureon:do:
Arguments and temporary variables:
error during printing
Receiver's instance variables:
sender: BlockClosureifError:
pc: 17
stackp: 3
method: a CompiledMethod (2306)
closureOrNil:   nil
receiver:   [closure] in [] in [] in [] in 
UndefinedObjectDoIt

BlockClosureensure:
Receiver: [closure] in MethodContext(ContextPart)handleSignal:
Arguments and temporary variables:
aBlock: [closure] in 
MethodContext(ContextPart)handleSignal:
returnValue:nil
b:  nil
Receiver's instance variables:
outerContext:   MethodContext(ContextPart)handleSignal:
startpc:90
numArgs:0

MethodContext(ContextPart)handleSignal:
Receiver: BlockClosureon:do:
Arguments and temporary variables:
exception:  Error: Invalid utf8 input detected
val:nil
Receiver's instance variables:
sender: BlockClosureifError:
pc: 17
stackp: 3
method: a CompiledMethod (2306)
closureOrNil:   nil
receiver:   [closure] in [] in [] in [] in 
UndefinedObjectDoIt

Error(Exception)signal
Receiver: Error: Invalid utf8 input detected
Arguments and temporary variables:

Receiver's instance variables:
messageText:'Invalid utf8 input detected'
tag:nil
signalContext:  Error(Exception)signal
handlerContext: BlockClosureon:do:
outerContext:   nil

Error(Exception)signal:
Receiver: Error: Invalid utf8 input detected
Arguments and temporary variables:
signalerText:   'Invalid utf8 input detected'
Receiver's instance variables:
messageText:'Invalid utf8 input detected'
tag:nil
signalContext:  Error(Exception)signal
handlerContext: BlockClosureon:do:
outerContext:   nil

UTF8TextConverter(Object)error:
Receiver: an UTF8TextConverter
Arguments and temporary variables:
aString:'Invalid utf8 input detected'
Receiver's instance variables:
an UTF8TextConverter

UTF8TextConvertererrorMalformedInput
Receiver: an UTF8TextConverter
Arguments and temporary variables:

Receiver's instance variables:
an UTF8TextConverter

UTF8TextConverternextFromStream:
Receiver: an UTF8TextConverter
Arguments and 

Re: [Pharo-project] invalid utf8 input detected

2009-05-17 Thread Stéphane Ducasse
doru

do you succeed to reproduce that?

Stef
On May 17, 2009, at 2:10 AM, Tudor Girba wrote:

 Hi,

 Recently I encounter a strange error:
 - I sometimes get a debugger due to some problems in my code
 - when I try to investigate the trace, I get another debugger saying
 that Invalid utf8 input detected'

 This second debugger I can investigate, the previous not. It looks
 like something got messed up with the text conversion of the sources.

 I am working on 10306 using the 4.1.1b2 VM on Mac. The code I am
 working on is loaded from squeaksource (Moose, Glamour, Mondrian).

 Anyone can confirm this problem?

 Cheers,
 Doru


 ERROR REPORT

 '17 May 2009 2:05:50 am

 VM: Mac OS - intel - 1056 - Squeak3.8.1 of ''28 Aug 2006'' [latest
 update: #6747] Squeak VM 4.1.1b2
 Image: Pharo0.1 [Latest update: #10306]

 SecurityManager state:
 Restricted: false
 FileAccess: true
 SocketAccess: true
 Working Dir /Users/girba/Work/Code/squeakingmoose
 Trusted Dir /foobar/tooBar/forSqueak/bogus
 Untrusted Dir /Users/girba/Library/Preferences/Squeak/Internet/My  
 Squeak

 UTF8TextConverter(Object)error:
   Receiver: an UTF8TextConverter
   Arguments and temporary variables:
   aString:''Invalid utf8 input detected''
   Receiver''s instance variables:
 an UTF8TextConverter

 UTF8TextConvertererrorMalformedInput
   Receiver: an UTF8TextConverter
   Arguments and temporary variables:

   Receiver''s instance variables:
 an UTF8TextConverter

 UTF8TextConverternextFromStream:
   Receiver: an UTF8TextConverter
   Arguments and temporary variables:
   aStream:MultiByteFileStream: ''/Users/girba/Work/Code/
 squeakingmoose/moose.chan...etc...
   character1: $
   value1: 160
   character2: Character tab
   value2: 9
   unicode:nil
   character3: Character tab
   value3: 9
   character4: nil
   value4: nil
   Receiver''s instance variables:
 an UTF8TextConverter

 MultiByteFileStreamnext
   Receiver: MultiByteFileStream: ''/Users/girba/Work/Code/
 squeakingmoose/moose.changes''
   Arguments and temporary variables:
   char:   nil
   secondChar: nil
   state:  nil
   Receiver''s instance variables:


 MultiByteFileStream(PositionableStream)nextChunk
   Receiver: MultiByteFileStream: ''/Users/girba/Work/Code/
 squeakingmoose/moose.changes''
   Arguments and temporary variables:
   terminator: $!
   out:a WriteStream ''doesNotUnderstand: aMessage
Handle the fact that there ...etc...
   ch: Character cr
   Receiver''s instance variables:


 MultiByteFileStream(PositionableStream)nextChunkText
   Receiver: MultiByteFileStream: ''/Users/girba/Work/Code/
 squeakingmoose/moose.changes''
   Arguments and temporary variables:
   string: nil
   runsRaw:nil
   strm:   nil
   runs:   nil
   peek:   nil
   pos:nil
   Receiver''s instance variables:


 [] in RemoteStringtext
   Receiver: a RemoteString
   Arguments and temporary variables:
   theFile:MultiByteFileStream: ''/Users/girba/Work/Code/
 squeakingmoose/moose.chan...etc...
   Receiver''s instance variables:
   sourceFileNumber:   2
   filePositionHi: 10007336

 BlockClosureensure:
   Receiver: [closure] in RemoteStringtext
   Arguments and temporary variables:
   aBlock: [closure] in RemoteStringtext
   returnValue:nil
   b:  nil
   Receiver''s instance variables:
   outerContext:   RemoteStringtext
   startpc:72
   numArgs:0

 RemoteStringtext
   Receiver: a RemoteString
   Arguments and temporary variables:
   theFile:MultiByteFileStream: ''/Users/girba/Work/Code/
 squeakingmoose/moose.chan...etc...
   Receiver''s instance variables:
   sourceFileNumber:   2
   filePositionHi: 10007336

 CompiledMethodgetSourceFromFile
   Receiver: a CompiledMethod (838)
   Arguments and temporary variables:
   position:   10007336
   Receiver''s instance variables:
 a CompiledMethod (838)

 CompiledMethodmethodNode
   Receiver: a CompiledMethod (838)
   Arguments and temporary variables:
   aClass: Object
   source: nil
   Receiver''s instance variables:
 a CompiledMethod (838)

 [] in DebuggerMethodMap classforMethod:
   Receiver: DebuggerMethodMap
   Arguments and temporary variables:
   aMethod:a CompiledMethod (838)
   Receiver''s instance variables:
   

Re: [Pharo-project] invalid utf8 input detected

2009-05-17 Thread Nicolas Cellier
Sure, a key stroke error, it's bar value:, not var value:,
This @!* workspace takes it as global without a warning

2009/5/17 Stéphane Ducasse stephane.duca...@inria.fr:
 Nicolas

 when I run your script on the license looking for image
 I got using 10306cl

 I get the following error:



 VM: Mac OS - intel - 1056 - Squeak3.8.1 of '28 Aug 2006' [latest
 update: #6747] Squeak VM 4.1.1b2
 Image: Pharo0.1 [Latest update: #10306]

 SecurityManager state:
 Restricted: false
 FileAccess: true
 SocketAccess: true
 Working Dir /Data/squeak4.0-relicenseTools/history
 Trusted Dir /foobar/tooBar/forSqueak/bogus
 Untrusted Dir /Users/ducasse/Library/Preferences/Squeak/Internet/My
 Squeak

 UndefinedObject(Object)doesNotUnderstand: #value:
Receiver: nil
Arguments and temporary variables:
 error during printing
Receiver's instance variables:
 nil

 [] in [] in [] in [] in UndefinedObjectDoIt
Receiver: nil
Arguments and temporary variables:
 error during printing
Receiver's instance variables:
 nil

 BlockClosurevalueWithPossibleArgs:
Receiver: [closure] in [] in [] in [] in UndefinedObjectDoIt
Arguments and temporary variables:
anArray:an Array('Error: Invalid utf8 input detected' 
 an
 UTF8TextConverter)
Receiver's instance variables:
outerContext:   [] in [] in [] in UndefinedObjectDoIt
startpc:183
numArgs:0

 [] in BlockClosureifError:
Receiver: [closure] in [] in [] in [] in UndefinedObjectDoIt
Arguments and temporary variables:
errorHandlerBlock:  Error: Invalid utf8 input detected
ex: [closure] in [] in [] in [] in UndefinedObjectDoIt
Receiver's instance variables:
outerContext:   [] in [] in [] in UndefinedObjectDoIt
startpc:171
numArgs:0

 BlockClosurevalueWithPossibleArgs:
Receiver: [closure] in BlockClosureifError:
Arguments and temporary variables:
anArray:an Array(Error: Invalid utf8 input detected)
Receiver's instance variables:
outerContext:   BlockClosureifError:
startpc:40
numArgs:1

 [] in MethodContext(ContextPart)handleSignal:
Receiver: BlockClosureon:do:
Arguments and temporary variables:
 error during printing
Receiver's instance variables:
sender: BlockClosureifError:
pc: 17
stackp: 3
method: a CompiledMethod (2306)
closureOrNil:   nil
receiver:   [closure] in [] in [] in [] in 
 UndefinedObjectDoIt

 BlockClosureensure:
Receiver: [closure] in MethodContext(ContextPart)handleSignal:
Arguments and temporary variables:
aBlock: [closure] in 
 MethodContext(ContextPart)handleSignal:
returnValue:nil
b:  nil
Receiver's instance variables:
outerContext:   MethodContext(ContextPart)handleSignal:
startpc:90
numArgs:0

 MethodContext(ContextPart)handleSignal:
Receiver: BlockClosureon:do:
Arguments and temporary variables:
exception:  Error: Invalid utf8 input detected
val:nil
Receiver's instance variables:
sender: BlockClosureifError:
pc: 17
stackp: 3
method: a CompiledMethod (2306)
closureOrNil:   nil
receiver:   [closure] in [] in [] in [] in 
 UndefinedObjectDoIt

 Error(Exception)signal
Receiver: Error: Invalid utf8 input detected
Arguments and temporary variables:

Receiver's instance variables:
messageText:'Invalid utf8 input detected'
tag:nil
signalContext:  Error(Exception)signal
handlerContext: BlockClosureon:do:
outerContext:   nil

 Error(Exception)signal:
Receiver: Error: Invalid utf8 input detected
Arguments and temporary variables:
signalerText:   'Invalid utf8 input detected'
Receiver's instance variables:
messageText:'Invalid utf8 input detected'
tag:nil
signalContext:  Error(Exception)signal
handlerContext: BlockClosureon:do:
outerContext:   nil

 UTF8TextConverter(Object)error:
Receiver: an UTF8TextConverter
Arguments and temporary variables:
aString:'Invalid utf8 input detected'
Receiver's instance variables:
 an UTF8TextConverter

 UTF8TextConvertererrorMalformedInput
Receiver: an UTF8TextConverter

Re: [Pharo-project] invalid utf8 input detected

2009-05-17 Thread Nicolas Cellier
There's something weird... If you hit var (UndefinedObject)
doesNotUnderstand: #value: that means there were a problem the first
time.

Unfortunately, due to bug in MethodContext tempNames, we don't know
the class and selector guilty.
From the set of selectors I can see this is Object.
From the source file position, I cannot say anything because I do not
have same change log history (sorry, own image).

You could try
(SourceFiles at: 2) readOnlyCopy position: 10007336; nextChunk

2009/5/17 Stéphane Ducasse stephane.duca...@inria.fr:
 sorry for not checking either.
 When I run this code I indeed do not have a problem on 10306cl

 stef

 On May 17, 2009, at 11:36 AM, Nicolas Cellier wrote:

 Sure, a key stroke error, it's bar value:, not var value:,
 This @!* workspace takes it as global without a warning

 2009/5/17 Stéphane Ducasse stephane.duca...@inria.fr:
 Nicolas

 when I run your script on the license looking for image
 I got using 10306cl

 I get the following error:



 VM: Mac OS - intel - 1056 - Squeak3.8.1 of '28 Aug 2006' [latest
 update: #6747] Squeak VM 4.1.1b2
 Image: Pharo0.1 [Latest update: #10306]

 SecurityManager state:
 Restricted: false
 FileAccess: true
 SocketAccess: true
 Working Dir /Data/squeak4.0-relicenseTools/history
 Trusted Dir /foobar/tooBar/forSqueak/bogus
 Untrusted Dir /Users/ducasse/Library/Preferences/Squeak/Internet/My
 Squeak

 UndefinedObject(Object)doesNotUnderstand: #value:
   Receiver: nil
   Arguments and temporary variables:
 error during printing
   Receiver's instance variables:
 nil

 [] in [] in [] in [] in UndefinedObjectDoIt
   Receiver: nil
   Arguments and temporary variables:
 error during printing
   Receiver's instance variables:
 nil

 BlockClosurevalueWithPossibleArgs:
   Receiver: [closure] in [] in [] in [] in UndefinedObjectDoIt
   Arguments and temporary variables:
   anArray:an Array('Error: Invalid utf8 input
 detected' an
 UTF8TextConverter)
   Receiver's instance variables:
   outerContext:   [] in [] in [] in UndefinedObjectDoIt
   startpc:183
   numArgs:0

 [] in BlockClosureifError:
   Receiver: [closure] in [] in [] in [] in UndefinedObjectDoIt
   Arguments and temporary variables:
   errorHandlerBlock:  Error: Invalid utf8 input
 detected
   ex: [closure] in [] in [] in [] in
 UndefinedObjectDoIt
   Receiver's instance variables:
   outerContext:   [] in [] in [] in UndefinedObjectDoIt
   startpc:171
   numArgs:0

 BlockClosurevalueWithPossibleArgs:
   Receiver: [closure] in BlockClosureifError:
   Arguments and temporary variables:
   anArray:an Array(Error: Invalid utf8 input
 detected)
   Receiver's instance variables:
   outerContext:   BlockClosureifError:
   startpc:40
   numArgs:1

 [] in MethodContext(ContextPart)handleSignal:
   Receiver: BlockClosureon:do:
   Arguments and temporary variables:
 error during printing
   Receiver's instance variables:
   sender: BlockClosureifError:
   pc: 17
   stackp: 3
   method: a CompiledMethod (2306)
   closureOrNil:   nil
   receiver:   [closure] in [] in [] in [] in
 UndefinedObjectDoIt

 BlockClosureensure:
   Receiver: [closure] in
 MethodContext(ContextPart)handleSignal:
   Arguments and temporary variables:
   aBlock: [closure] in
 MethodContext(ContextPart)handleSignal:
   returnValue:nil
   b:  nil
   Receiver's instance variables:
   outerContext:
 MethodContext(ContextPart)handleSignal:
   startpc:90
   numArgs:0

 MethodContext(ContextPart)handleSignal:
   Receiver: BlockClosureon:do:
   Arguments and temporary variables:
   exception:  Error: Invalid utf8 input detected
   val:nil
   Receiver's instance variables:
   sender: BlockClosureifError:
   pc: 17
   stackp: 3
   method: a CompiledMethod (2306)
   closureOrNil:   nil
   receiver:   [closure] in [] in [] in [] in
 UndefinedObjectDoIt

 Error(Exception)signal
   Receiver: Error: Invalid utf8 input detected
   Arguments and temporary variables:

   Receiver's instance variables:
   messageText:'Invalid utf8 input detected'
   tag:nil
   signalContext:  Error(Exception)signal
   handlerContext: BlockClosureon:do:
   outerContext:   nil

 Error(Exception)signal:
   Receiver: Error: Invalid utf8 input detected
   Arguments and temporary variables:
   

Re: [Pharo-project] invalid utf8 input detected

2009-05-17 Thread Tudor Girba
Hi,

I ran the snippet you sent on both 304cl and 306cl and I get the  
following list:

Object-#doesNotUnderstand:
SystemNavigation-#browseMethodsWhoseNamesContain:
Utilities class-#changeStampPerSe
Utilities class-#methodsWithInitials:

Indeed, most of the annoyances are due to the  
ObjectdoesNotUnderstand: because when I get a DNU I am stuck (and I  
feel like in Java :)).

I am not sure I understand if there is a fix to the problem.

Cheers,
Doru



On 17 May 2009, at 12:06, Nicolas Cellier wrote:

 There's something weird... If you hit var (UndefinedObject)
 doesNotUnderstand: #value: that means there were a problem the first
 time.

 Unfortunately, due to bug in MethodContext tempNames, we don't know
 the class and selector guilty.
 From the set of selectors I can see this is Object.
 From the source file position, I cannot say anything because I do not
 have same change log history (sorry, own image).

 You could try
 (SourceFiles at: 2) readOnlyCopy position: 10007336; nextChunk

 2009/5/17 Stéphane Ducasse stephane.duca...@inria.fr:
 sorry for not checking either.
 When I run this code I indeed do not have a problem on 10306cl

 stef

 On May 17, 2009, at 11:36 AM, Nicolas Cellier wrote:

 Sure, a key stroke error, it's bar value:, not var value:,
 This @!* workspace takes it as global without a warning

 2009/5/17 Stéphane Ducasse stephane.duca...@inria.fr:
 Nicolas

 when I run your script on the license looking for image
 I got using 10306cl

 I get the following error:



 VM: Mac OS - intel - 1056 - Squeak3.8.1 of '28 Aug 2006' [latest
 update: #6747] Squeak VM 4.1.1b2
 Image: Pharo0.1 [Latest update: #10306]

 SecurityManager state:
 Restricted: false
 FileAccess: true
 SocketAccess: true
 Working Dir /Data/squeak4.0-relicenseTools/history
 Trusted Dir /foobar/tooBar/forSqueak/bogus
 Untrusted Dir /Users/ducasse/Library/Preferences/Squeak/Internet/My
 Squeak

 UndefinedObject(Object)doesNotUnderstand: #value:
   Receiver: nil
   Arguments and temporary variables:
 error during printing
   Receiver's instance variables:
 nil

 [] in [] in [] in [] in UndefinedObjectDoIt
   Receiver: nil
   Arguments and temporary variables:
 error during printing
   Receiver's instance variables:
 nil

 BlockClosurevalueWithPossibleArgs:
   Receiver: [closure] in [] in [] in [] in UndefinedObjectDoIt
   Arguments and temporary variables:
   anArray:an Array('Error: Invalid utf8 input
 detected' an
 UTF8TextConverter)
   Receiver's instance variables:
   outerContext:   [] in [] in [] in UndefinedObjectDoIt
   startpc:183
   numArgs:0

 [] in BlockClosureifError:
   Receiver: [closure] in [] in [] in [] in UndefinedObjectDoIt
   Arguments and temporary variables:
   errorHandlerBlock:  Error: Invalid utf8 input
 detected
   ex: [closure] in [] in [] in [] in
 UndefinedObjectDoIt
   Receiver's instance variables:
   outerContext:   [] in [] in [] in UndefinedObjectDoIt
   startpc:171
   numArgs:0

 BlockClosurevalueWithPossibleArgs:
   Receiver: [closure] in BlockClosureifError:
   Arguments and temporary variables:
   anArray:an Array(Error: Invalid utf8 input
 detected)
   Receiver's instance variables:
   outerContext:   BlockClosureifError:
   startpc:40
   numArgs:1

 [] in MethodContext(ContextPart)handleSignal:
   Receiver: BlockClosureon:do:
   Arguments and temporary variables:
 error during printing
   Receiver's instance variables:
   sender: BlockClosureifError:
   pc: 17
   stackp: 3
   method: a CompiledMethod (2306)
   closureOrNil:   nil
   receiver:   [closure] in [] in [] in [] in
 UndefinedObjectDoIt

 BlockClosureensure:
   Receiver: [closure] in
 MethodContext(ContextPart)handleSignal:
   Arguments and temporary variables:
   aBlock: [closure] in
 MethodContext(ContextPart)handleSignal:
   returnValue:nil
   b:  nil
   Receiver's instance variables:
   outerContext:
 MethodContext(ContextPart)handleSignal:
   startpc:90
   numArgs:0

 MethodContext(ContextPart)handleSignal:
   Receiver: BlockClosureon:do:
   Arguments and temporary variables:
   exception:  Error: Invalid utf8 input detected
   val:nil
   Receiver's instance variables:
   sender: BlockClosureifError:
   pc: 17
   stackp: 3
   method: a CompiledMethod (2306)
   closureOrNil:   nil
   receiver:   [closure] in [] in [] in [] in
 UndefinedObjectDoIt

 Error(Exception)signal
   Receiver: Error: Invalid utf8 input detected
   Arguments and temporary variables:

   Receiver's instance variables:
   messageText:'Invalid utf8 input detected'
   tag:nil
   

Re: [Pharo-project] invalid utf8 input detected

2009-05-17 Thread Nicolas Cellier
OK,

{Object#doesNotUnderstand:.
SystemNavigation#browseMethodsWhoseNamesContain:.
Utilities class#changeStampPerSe.
Utilities class#methodsWithInitials:} collect: [:e | e getSourceFromFile].

does not fail for me, BUT all these sources look like decompileString.
I guess this is dating from the condenseChanges that occured in #update10298
Change log prior to this update should have the problem.

Nicolas

2009/5/17 Tudor Girba gi...@iam.unibe.ch:
 Hi,

 I ran the snippet you sent on both 304cl and 306cl and I get the
 following list:

 Object-#doesNotUnderstand:
 SystemNavigation-#browseMethodsWhoseNamesContain:
 Utilities class-#changeStampPerSe
 Utilities class-#methodsWithInitials:

 Indeed, most of the annoyances are due to the
 ObjectdoesNotUnderstand: because when I get a DNU I am stuck (and I
 feel like in Java :)).

 I am not sure I understand if there is a fix to the problem.

 Cheers,
 Doru



 On 17 May 2009, at 12:06, Nicolas Cellier wrote:

 There's something weird... If you hit var (UndefinedObject)
 doesNotUnderstand: #value: that means there were a problem the first
 time.

 Unfortunately, due to bug in MethodContext tempNames, we don't know
 the class and selector guilty.
 From the set of selectors I can see this is Object.
 From the source file position, I cannot say anything because I do not
 have same change log history (sorry, own image).

 You could try
 (SourceFiles at: 2) readOnlyCopy position: 10007336; nextChunk

 2009/5/17 Stéphane Ducasse stephane.duca...@inria.fr:
 sorry for not checking either.
 When I run this code I indeed do not have a problem on 10306cl

 stef

 On May 17, 2009, at 11:36 AM, Nicolas Cellier wrote:

 Sure, a key stroke error, it's bar value:, not var value:,
 This @!* workspace takes it as global without a warning

 2009/5/17 Stéphane Ducasse stephane.duca...@inria.fr:
 Nicolas

 when I run your script on the license looking for image
 I got using 10306cl

 I get the following error:



 VM: Mac OS - intel - 1056 - Squeak3.8.1 of '28 Aug 2006' [latest
 update: #6747] Squeak VM 4.1.1b2
 Image: Pharo0.1 [Latest update: #10306]

 SecurityManager state:
 Restricted: false
 FileAccess: true
 SocketAccess: true
 Working Dir /Data/squeak4.0-relicenseTools/history
 Trusted Dir /foobar/tooBar/forSqueak/bogus
 Untrusted Dir /Users/ducasse/Library/Preferences/Squeak/Internet/My
 Squeak

 UndefinedObject(Object)doesNotUnderstand: #value:
   Receiver: nil
   Arguments and temporary variables:
 error during printing
   Receiver's instance variables:
 nil

 [] in [] in [] in [] in UndefinedObjectDoIt
   Receiver: nil
   Arguments and temporary variables:
 error during printing
   Receiver's instance variables:
 nil

 BlockClosurevalueWithPossibleArgs:
   Receiver: [closure] in [] in [] in [] in UndefinedObjectDoIt
   Arguments and temporary variables:
   anArray:an Array('Error: Invalid utf8 input
 detected' an
 UTF8TextConverter)
   Receiver's instance variables:
   outerContext:   [] in [] in [] in UndefinedObjectDoIt
   startpc:183
   numArgs:0

 [] in BlockClosureifError:
   Receiver: [closure] in [] in [] in [] in UndefinedObjectDoIt
   Arguments and temporary variables:
   errorHandlerBlock:  Error: Invalid utf8 input
 detected
   ex: [closure] in [] in [] in [] in
 UndefinedObjectDoIt
   Receiver's instance variables:
   outerContext:   [] in [] in [] in UndefinedObjectDoIt
   startpc:171
   numArgs:0

 BlockClosurevalueWithPossibleArgs:
   Receiver: [closure] in BlockClosureifError:
   Arguments and temporary variables:
   anArray:an Array(Error: Invalid utf8 input
 detected)
   Receiver's instance variables:
   outerContext:   BlockClosureifError:
   startpc:40
   numArgs:1

 [] in MethodContext(ContextPart)handleSignal:
   Receiver: BlockClosureon:do:
   Arguments and temporary variables:
 error during printing
   Receiver's instance variables:
   sender: BlockClosureifError:
   pc: 17
   stackp: 3
   method: a CompiledMethod (2306)
   closureOrNil:   nil
   receiver:   [closure] in [] in [] in [] in
 UndefinedObjectDoIt

 BlockClosureensure:
   Receiver: [closure] in
 MethodContext(ContextPart)handleSignal:
   Arguments and temporary variables:
   aBlock: [closure] in
 MethodContext(ContextPart)handleSignal:
   returnValue:nil
   b:  nil
   Receiver's instance variables:
   outerContext:
 MethodContext(ContextPart)handleSignal:
   startpc:90
   numArgs:0

 MethodContext(ContextPart)handleSignal:
   Receiver: BlockClosureon:do:
   Arguments and temporary variables:
   exception:  Error: Invalid utf8 input detected
   val:nil
   Receiver's instance variables:
   sender: BlockClosureifError:

Re: [Pharo-project] invalid utf8 input detected

2009-05-17 Thread Stéphane Ducasse

On May 17, 2009, at 9:42 PM, Nicolas Cellier wrote:

 Just to remind my change was not a fix, just a workaround.
 We have to discover why these non UTF-8 sources got into the change
 file and cure the problem. Otherwise we might suffer a plague of
 decompiled code spreading in our browsers :(

Yes!

Stef

___
Pharo-project mailing list
Pharo-project@lists.gforge.inria.fr
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project


Re: [Pharo-project] invalid utf8 input detected

2009-05-16 Thread Dale Henrichs
I've seen this as well in 10306 on Linux. 

You can switch to the old debugger to avoid hitting the problem (but you are 
just avoiding the problem). The problem is with RemoteString and multiByte file 
and I thought that that problem had been solved ... perhaps not for all cases?

If this is a problem for a number of folks, I could see if I can work out a 
workaround in the OTDebugger until the underlying problem is fixed.

Dale
- Tudor Girba gi...@iam.unibe.ch wrote:

| Hi,
| 
| Recently I encounter a strange error:
| - I sometimes get a debugger due to some problems in my code
| - when I try to investigate the trace, I get another debugger saying 
| 
| that Invalid utf8 input detected'
| 
| This second debugger I can investigate, the previous not. It looks  
| like something got messed up with the text conversion of the sources.
| 
| I am working on 10306 using the 4.1.1b2 VM on Mac. The code I am  
| working on is loaded from squeaksource (Moose, Glamour, Mondrian).
| 
| Anyone can confirm this problem?
| 
| Cheers,
| Doru
| 
| 
| ERROR REPORT
| 
| '17 May 2009 2:05:50 am
| 
| VM: Mac OS - intel - 1056 - Squeak3.8.1 of ''28 Aug 2006'' [latest  
| update: #6747] Squeak VM 4.1.1b2
| Image: Pharo0.1 [Latest update: #10306]
| 
| SecurityManager state:
| Restricted: false
| FileAccess: true
| SocketAccess: true
| Working Dir /Users/girba/Work/Code/squeakingmoose
| Trusted Dir /foobar/tooBar/forSqueak/bogus
| Untrusted Dir /Users/girba/Library/Preferences/Squeak/Internet/My
| Squeak
| 
| UTF8TextConverter(Object)error:
|   Receiver: an UTF8TextConverter
|   Arguments and temporary variables:
|   aString:''Invalid utf8 input detected''
|   Receiver''s instance variables:
| an UTF8TextConverter
| 
| UTF8TextConvertererrorMalformedInput
|   Receiver: an UTF8TextConverter
|   Arguments and temporary variables:
| 
|   Receiver''s instance variables:
| an UTF8TextConverter
| 
| UTF8TextConverternextFromStream:
|   Receiver: an UTF8TextConverter
|   Arguments and temporary variables:
|   aStream:MultiByteFileStream: ''/Users/girba/Work/Code/ 
| squeakingmoose/moose.chan...etc...
|   character1: $
|   value1: 160
|   character2: Character tab
|   value2: 9
|   unicode:nil
|   character3: Character tab
|   value3: 9
|   character4: nil
|   value4: nil
|   Receiver''s instance variables:
| an UTF8TextConverter
| 
| MultiByteFileStreamnext
|   Receiver: MultiByteFileStream: ''/Users/girba/Work/Code/ 
| squeakingmoose/moose.changes''
|   Arguments and temporary variables:
|   char:   nil
|   secondChar: nil
|   state:  nil
|   Receiver''s instance variables:
| 
| 
| MultiByteFileStream(PositionableStream)nextChunk
|   Receiver: MultiByteFileStream: ''/Users/girba/Work/Code/ 
| squeakingmoose/moose.changes''
|   Arguments and temporary variables:
|   terminator: $!
|   out:a WriteStream ''doesNotUnderstand: aMessage
|Handle the fact that there ...etc...
|   ch: Character cr
|   Receiver''s instance variables:
| 
| 
| MultiByteFileStream(PositionableStream)nextChunkText
|   Receiver: MultiByteFileStream: ''/Users/girba/Work/Code/ 
| squeakingmoose/moose.changes''
|   Arguments and temporary variables:
|   string: nil
|   runsRaw:nil
|   strm:   nil
|   runs:   nil
|   peek:   nil
|   pos:nil
|   Receiver''s instance variables:
| 
| 
| [] in RemoteStringtext
|   Receiver: a RemoteString
|   Arguments and temporary variables:
|   theFile:MultiByteFileStream: ''/Users/girba/Work/Code/ 
| squeakingmoose/moose.chan...etc...
|   Receiver''s instance variables:
|   sourceFileNumber:   2
|   filePositionHi: 10007336
| 
| BlockClosureensure:
|   Receiver: [closure] in RemoteStringtext
|   Arguments and temporary variables:
|   aBlock: [closure] in RemoteStringtext
|   returnValue:nil
|   b:  nil
|   Receiver''s instance variables:
|   outerContext:   RemoteStringtext
|   startpc:72
|   numArgs:0
| 
| RemoteStringtext
|   Receiver: a RemoteString
|   Arguments and temporary variables:
|   theFile:MultiByteFileStream: ''/Users/girba/Work/Code/ 
| squeakingmoose/moose.chan...etc...
|   Receiver''s instance variables:
|   sourceFileNumber:   2
|   filePositionHi: 10007336
| 
| CompiledMethodgetSourceFromFile
|   Receiver: a CompiledMethod (838)
|   Arguments and temporary variables:
|   position: