Re: Please shoot down this GC idea...

2001-02-14 Thread Hong Zhang
I want to share my experience of garbage collection of the Java virtual machine. There are two common types of garbage collection, the agressive reference count based and everything else. The reference count system can garantee the quick response to memory release. In such a system, we can

Re: Garbage collection (was Re: JWZ on s/Java/Perl/)

2001-02-14 Thread Hong Zhang
A deterministic finalization means we shouldn't need to force programmers to have good ideas. Make it easy, remember? :) I don't believe such an algorithm exists, unless you stick with reference count. Hong

Re: Garbage collection (was Re: JWZ on s/Java/Perl/)

2001-02-15 Thread Hong Zhang
{ my $fh = IO::File-new("file"); print $fh "foo\n"; } { my $fh = IO::File-new("file"); print $fh "bar\n"; } At present "file" will contain "foo\nbar\n". Without DF it could just as well be "bar\nfoo\n". Make no mistake, this is a major change to the

Re: Garbage collection (was Re: JWZ on s/Java/Perl/)

2001-02-15 Thread Hong Zhang
Hong Zhang wrote: This code should NEVER work, period. People will just ask for trouble with this kind of code. Actually I meant to have specified "" as the mode, i.e. append, then what I originally said holds true. This behaviour is predictable and dependable in the cu

string encoding

2001-02-15 Thread Hong Zhang
Hi, All, I want to give some of my thougts about string encoding. Personally I like the UTF-8 encoding. The solution to the variable length can be handled by a special (virtual) function like class String { virtual UV iterate(/*inout*/ int* index); }; So in typical string iteration, the

Re: string encoding

2001-02-15 Thread Hong Zhang
On Thu, Feb 15, 2001 at 02:31:03PM -0800, Hong Zhang wrote: Personally I like the UTF-8 encoding. The solution to the variable length can be handled by a special (virtual) function like I'm expecting that the virtual, internal representation will not be in a UTF but will simply

Re: string encoding

2001-02-15 Thread Hong Zhang
On Thu, Feb 15, 2001 at 03:59:54PM -0800, Hong Zhang wrote: The concept of characters have nothing to do with codepoints. Many characters are composed by more than one codepoints. This isn't true. What do you mean? Have you seen people using multi-byte encoding in Japan/China/Korea

Re: string encoding

2001-02-15 Thread Hong Zhang
...and because of this you can't randomly access the string, you are reduced to sequential access (*). And here I thought we could have left tape drives to the last millennium. (*) Yes, of course you could cache your sequential access so you only need to do it once, and build balanced

Re: string encoding

2001-02-16 Thread Hong Zhang
People in Japan/China/Korea have been using multi-byte encoding for long time. I personally have used it for more 10 years. I never feel much of the "pain". Do you think I are using my computer with O(n) while you are using it with O(1)? There are 100 million people using variable-length

Re: string encoding

2001-02-16 Thread Hong Zhang
What do you mean? Have you seen people using multi-byte encoding in Japan/China/Korea? You're talking to the wrong person. Japanese data handling is my graduate dissertation. :) The Unified Hangul/Kanji/Ha'nzi' Characters in Unicode (so-called "Unihan") occupy one and only one codepoint

Re: string encoding

2001-02-16 Thread Hong Zhang
And address arithmetic and mem(cmp|cpy) is faster than array iteration. Ha Ha Ha. You must be kidding. The mem(cmp|cpy) work just fine on UTF-8 string comparison and copy. But the memcmp() can not be used for UTF-32 string comparison, because of endian issue. Hong

Re: string encoding

2001-02-16 Thread Hong Zhang
Did it buy you much? I don't believe so. Can you give some examples why random character access is so important? Most people are processing text linearly. Most, but not all. And as this is the internals list, we have to deal with all. We can't choose a convenient subset and ignore the rest.

Re: string encoding

2001-02-16 Thread Hong Zhang
I like to wrap up my argument. I recommend to use UTF-8 as the sole string encoding. If we end up with multiple encodings, there is absolutely no point for this argument. Benefits of UTF-8 is more compact, less encoding conversion, more friendly to C API. UTF-16 is variable length encoding too,

Re: C Garbage collector

2001-02-23 Thread Hong Zhang
I don't quite understand what is the intention here. Most of C garbage collector is mark sweep based. It has all common problems of gc, for example non-deterministic finalization (destruction), or conservativeness. If we decide to use GC for Perl, it will be trivial to implement a simple mark

Questions about PDD 4: Internal data types

2001-03-02 Thread Hong Zhang
Integer data types are generically referred to as CINTs. There is an CINT typedef that is guaranteed to hold any integer type. Does such thing exist? Unless it is BIGINT. Should we scrap the buffer pointer and just tack the buffer on the end of the structure? Saves a level of indirection,

Re: PDD 4: Internal data types

2001-03-02 Thread Hong Zhang
I was hoping to get us something that was guaranteed to hold an integer, no matter what it was, so you could do something like: struct thingie { UV type; INT my_int; } What is the purpose of doing this? The SV is guaranteed to hold anything. Why we need a type that can

Re: PDD 4: Internal data types

2001-03-05 Thread Hong Zhang
struct perl_string { void *string_buffer; UV length; UV allocated; UV flags; } The low three bits of the flags field is reserved for the type of the string. The various types are: =over 4 =item BINARY (0) =item ASCII (1) =item EBCDIC (2) =item

Re: PDD 4: Internal data types

2001-03-05 Thread Hong Zhang
Here is an example, "re`sume`" takes 6 characters in Latin-1, but could take 8 characters in Unicode. All Perl functions that directly deal with character position and length will be sensitive to encoding. I wonder how we should handle this case. My first inclination is to force

Re: PDD 4: Internal data types

2001-03-06 Thread Hong Zhang
Unless I really, *really* misread the unicode standard (which is distinctly possible) normalization has nothing to do with encoding, I understand what you are trying to say. But it is not very easy in practice. The normalization has something to do with encoding. If you compare two strings

Re: PDD 4: Internal data types

2001-03-08 Thread Hong Zhang
I was thinking maybe (length/4)*31-bit 2s complement to make portable overflow detection easier, but that would be only if there wasn't a good C library for this available to snag. I believe Python uses (length/2)*15-bit 2's complement representation. Because bigint and bitnum are

Re: PDD 4: Internal data types

2001-03-08 Thread Hong Zhang
For bigint, we definite need a highly portable implementation. People can do platform specific optimization on their own later. We should settle the generic implementation first, with proper encapsulation. Hong Do we need to settle on anything - can it vary by platform so that 64 bit

Re: PDD 4: Internal data types

2001-03-22 Thread Hong Zhang
The normalization has something to do with encoding. If you compare two strings with the same encoding, of course you don't have to care about it. Of course you do. Think about it. I said "you don't have to". You can use "==" for codepoint comparison, and something like

Re: Idea for safe signal handling by a byte code interpreter

2001-03-22 Thread Hong Zhang
Here is some of my experience with HotSpot for Linux port. I've read, in the glibc info manuals, the the similar situation exists in C programming -- you don't want to do a lot inside the signal handler; just set a flag and return, then check that flag from your main loop, and run a

Re: Idea for safe signal handling by a byte code interpreter

2001-03-22 Thread Hong Zhang
What if, at the C level, you had a signal handler that sets or increments a flag or counter, stuffs a struct with information about the signal's context, then pushes (by "push", I mean "(cons v ls)", not "(append! ls v)" 'whatever ;-) that struct on a stack... Hong I

Re: Unicode handling

2001-03-23 Thread Hong Zhang
I recommend to use 'u' flag, which indicates all operations are performed against unicode grapheme/glyph. By default re is performed on codepoint. U doesn't really signal "glyph" to me, but we are sort of limited in what we have left. We still need a zero-width assertion for glyph boundary

Re: Unicode handling

2001-03-23 Thread Hong Zhang
We need the character equivalence construct, such as [[=a=]], which matches "a", "A ACUTE". Yeah, we really need a big list of these. PDD anyone? But surely this is a locale issue, and not an encoding one? Not every language recognizes the same character equivalences. Let me

Re: Perl_foo() vs foo() etc

2001-04-12 Thread Hong Zhang
IIRC, ISO C says you cannot have /^_[A-Z_][A-Za-z_0-9]*$/. That's reserved for the standard. If you consider our prefix is "_Perl_" not just "_", we will be pretty safe. There are just not many people follow the standard anyway :-) Hong

RE: Stacks registers

2001-05-23 Thread Hong Zhang
Register based. Untyped registers; I'm hoping that the vtable stuff can be sufficiently optimized that there'll be no major win in storing multiple copies of a PMC's data in different types knocking around. For those yet to be convinced by the benefits of registers over stacks, try

RE: Stacks, registers, and bytecode. (Oh, my!)

2001-05-29 Thread Hong Zhang
here is an idea. if we use a pure stack design but you can access the stack values with an index, then the index number can get large. so a fixed register set would allow us to limit the index to 8 bits. so the byte code could look something like this: 16 bit op (plenty of

RE: Stacks, registers, and bytecode. (Oh, my!)

2001-05-30 Thread Hong Zhang
There's no reason why you can.t have a hybrid scheme. In fact I think it's a big win over a pure register-addressing scheme. Consider... The hybrid scheme may be a win in some cases, but I am not sure if it worth the complexity. I personally prefer a strict RISC style opcodes, mainly load,

RE: Stacks, registers, and bytecode. (Oh, my!)

2001-06-05 Thread Hong Zhang
On Tue, Jun 05, 2001 at 11:25:09AM +0100, Dave Mitchell wrote: This is the bit that scares me about unifying perl ops and regex ops: can we really unify them without taking a performance hit? Coupl'a things: firstly, we can make Perl 6 ops as lightweight as we like. Second, Ruby uses a

RE: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread Hong Zhang
Courtesy of Slashdot, http://www.hastingsresearch.com/net/04-unicode-limitations.shtml I'm not sure if this is an issue for us or not, as we're generally language-neutral, and I don't see any technical issues with any of the UTF-* encodings having headroom problems. I think the author

RE: Should we care much about this Unicode-ish criticism?

2001-06-05 Thread Hong Zhang
. This is a very common practice, nothing to surprise. As you can tell, my name is "hong zhang", which already lost "chinese tone" and "glyph". "hong" has 4 tones, each tone can be any of several characters, each character can be one of several glyphs (simpli

RE: Unicode sorting...

2001-06-08 Thread Hong Zhang
I can't really believe that this would be a problem, but if they're integrated alphabets from different locales, will there be issues with sorting (if we're not planning to use the locale)? Are there instances where like characters were combined that will affect the sort orders?

RE: Should we care much about this Unicode-ish criticism?

2001-06-08 Thread Hong Zhang
What happens if unicode supported uppercase and lowercase numbers? [I had a dig about, and it doesn't seem to mention lowercase or uppercase digits. Are they just a typography distinction, and hence not enough to be worthy of codepoints?] Damned if I know; I didn't know there even

RE: Should we care much about this Unicode-ish criticism?

2001-06-11 Thread Hong Zhang
However, I don't think this actually affects your comments, except that I'd guess that the half digits mentioned by Hong don't have the same term case used with them that the letters of various alphabets do. I am not sure if we mean the same thing. The regular ascii 0123456789 are called

RE: More character matching bits

2001-06-12 Thread Hong Zhang
We should let external collator to handle all these fancy features. People can always normalize/canonicalize/do-whatever-you-want and send the result text/binary to regex. All the features we argue about here can be easily done by a customized collator. Do NOT expect the Perl regex be a

RE: The internal string API

2001-06-19 Thread Hong Zhang
* Convert from and to UTF-32 * lengths in bytes, characters, and possibly glyphs * character size (with the variable length ones reporting in negative numbers) What do you mean by character size if it does not support variable length? * get and set the locale (This might not be the spot

RE: The internal string API

2001-06-19 Thread Hong Zhang
This is the common approach of complicated text representation, the implemetations I have seen includes IBM IText and SGI rope. For rope, each rope is represented by either of a simple immutable string, a simple mutable string, a simple immutable substring of another rope, or a binary node of

RE: The internal string API

2001-06-20 Thread Hong Zhang
The one problem with copy-on-write is that, if we implement it in software, we end up paying the price to check it on every string write. (No free depending on the hardware, alas) Not that this should shoot down the idea of COW strings, but it is a cost that needs considering. (I

RE: Draft assembly PDD

2001-08-06 Thread Hong Zhang
The branch instruction is wrong. It should be branch #num. The offset should be part of instruction, not from register. Nope, because that kills the potential for computed relative branches. (It's in there on purpose) Branches should work from both constants and registers. Even so, the

RE: Final draft: Conventions and Guidelines for Perl Source Code

2001-08-13 Thread Hong Zhang
I believe the advantage of if (...) { ... } else { ... } is to write very dense code, especially when the block itself is single line. This style may not be readable to some people. This style is not very consistent, if (...) { ... } else { ... } I believe it would better

RE: An overview of the Parrot interpreter

2001-09-05 Thread Hong Zhang
True, but it is easier to generate FAST code for a register machine. A stack machine forces a lot of book-keeping either run-time inc/dec of sp, or alternatively compile-time what-is-offset-now stuff. The latter is a real pain if you are trying to issue multiple instructions at once. I

RE: An overview of the Parrot interpreter

2001-09-05 Thread Hong Zhang
If you really want a comparison, here's one. Take this loop: i = 0; while (i 1000) { i = i + 7; } with the ops executed in the loop marked with pipes. The corresponding parrot code would be: getaddr P0, i store P0, 0 store I0,

RE: Math functions? (Particularly transcendental ones)

2001-09-10 Thread Hong Zhang
Uri Guttman we are planning automatic over/underflow to bigfloat. so there is no need for traps. they could be provided at the time of the conversion to big*. OK. But will Perl support signaling and non-signaling NANs? I don't think we should go for automatic overflow/underflow

RE: Parrot coredumps on Solaris 8

2001-09-12 Thread Hong Zhang
Now works on Solaris and i386, but segfaults at the GRAB_IV call in read_constants_table on my Alpha. Problems with the integer-pointer conversions in memory.c? (line 29 is giving me a warning). The line 29 is extremely wrong. It assigns IV to void* without casting. The alignment calculation

Using int32_t instead of IV for code

2001-09-12 Thread Hong Zhang
I think we should use int32_t instead of IV for all code related data. The IV is 64-bit on 64-bit machine, which is significant waste. The IV is also platform specific, and has caused some nasty problems so far. Hong

RE: Using int32_t instead of IV for code

2001-09-13 Thread Hong Zhang
If we are going to keep on doing fancy stuff with pointer arithmetic (eg the Alloc_Aligned/CHUNK_BASE stuff), I think we're also going to need an integer type which is guaranteed to be the same width as a pointer, so we can freely typecast between the two. You are not supposed to do fancy

RE: Using int32_t instead of IV for code

2001-09-13 Thread Hong Zhang
I'd have thought it made sense to define it as a bytecode_t type, or some such which could be platform specific. It is better called opcode_t, since we are not using bytecode anyway. Hong

RE: Bytecode file format

2001-09-14 Thread Hong Zhang
OffsetLength Description 0 1 Magic Cookie (0x013155a1) 1 n Data n+1 m Directory Table m+n+1 1 Offset of beginning of directory table (i.e. n+1) I think we need a version right after cookie for long term compatibility. The directory is after the

RE: RFC: Bytecode file format

2001-09-14 Thread Hong Zhang
8-byte word:endianness (magic value 0x123456789abcdef0) byte: word size byte[7]:empty word: major version word: minor version Where all word values are as big as the word size says they are. The magic value can be something else, but it should

RE: RFC: Bytecode file format

2001-09-14 Thread Hong Zhang
We can't do that. There are platforms on both ends that have _no_ native 32-bit data formats (Crays, some 16-bit CPUs?). They still need to be able to load and generate bytecode without ridiculuous CPU penalties (your Palm III is not running on a 700MHz Pentium III, after all!) If the

RE: RFC: Bytecode file format

2001-09-14 Thread Hong Zhang
There's a one-off conversion penalty at bytecode load time, and I don't consider that excessive. I want the bytecode to potentially be in platform native format (4/8 byte ints, big or little endian) with a simple and well-defined set of conversion semantics. That way the bytecode loader

RE: Bytecode safety

2001-09-18 Thread Hong Zhang
Proposed: Parrot should never crash due to malformed bytecode. When choosing between execution speed and bytecode safety, safety should always win. Careful op design and possibly a validation pass before execution will hopefully keep the speed penalty to a minimum. We can use similar model

RE: [PATCH] changing IV to opcode_t!!

2001-09-18 Thread Hong Zhang
Do we want the opcode to be so complicated? I thought we are going to use this kind of thing for generic pointers. The p member of opcode does not make any sense to me. Hong Earlier there was some discussion about changing typedef long IV to typedef union { IV i; void* p; } opcode_t;

RE: Check NV alignment for Solaris

2001-09-19 Thread Hong Zhang
One of the things that might be coring solaris is the potential for embedded floats in the bytecode stream. (The more I think about that the more I regret it...) The ops do a quick and ugly cast to treat some of the opcode stream as an NV which may trip across alignment rules and size

RE: Parrot multithreading?

2001-09-20 Thread Hong Zhang
DS I'm also seriously considering throwing *all* PerlIO code into separate DS threads (one per file) as an aid to asynchrony. but that will be hard to support on systems without threads. i still have that internals async i/o idea floating in my numb skull. it is an api that would

RE: Parrot multithreading?

2001-09-20 Thread Hong Zhang
Nope. Internal I/O, at least as the interpreter will see it is async. You can build sync from async, it's a big pain to build async from sync. Doesn't mean we actually get asynchrony, just that we can. It is trivial to build async from sync, just using thread. Most Unix async are built

RE: variable number of arguments

2001-09-24 Thread Hong Zhang
is it possible the ops to handle variable number of arguments, what I have in mind : print I1,,,N2,\n This should be done by create array opcode plus print array opcode. [1, 2, 3, 4, 5] The create array opcode takes n top of stack (or n of registers) and create an array out of it. Both

RE: [PATCH] assemble.pl registers go from 0-31

2001-09-24 Thread Hong Zhang
Attached patch makes sure you don't try and use register numbers over 31. That is, this patch allows registers I0-I31 and anything else gets a: Error (foo.pasm:0): Register 32 out of range (should be 0-31) in 'set_i_ic' Oh, there's also a comment at end of line patch that has snuck in

RE: [PATCH] assemble.pl registers go from 0-31

2001-09-24 Thread Hong Zhang
Just curious, do we need a dedicated zero register and sink register? I've been pondering that one and waffling back and forth. At the moment I don't think so, since there's no benefit to going with a zero register over a zero constant, but that could change tomorrow. For example, once

RE: Tru64 core dumps

2001-09-26 Thread Hong Zhang
# 0xf000 for 64 bit systems. With that changed Don't bother. Make the constant be ~0xfff. :) Umm, are you sure? It's used in an integer context and masked against an IV, so you might need an 'int', a 'long', or a 'long long'. I'm unsure what type to portably assume for

RE: Tru64 core dumps

2001-09-26 Thread Hong Zhang
You are using the wrong flag. The expression in second is long long. So you should use flag %llx. Since printf uses vararg, it is undefined behavior if there is type mismatch with argument. Hong Hehehe. Ok. Guess what the following will print: #include stdio.h int main(void) { int

RE: thread vs signal

2001-09-30 Thread Hong Zhang
How does python handle MT? Honestly? Really, really badly, at least from a performance point of view. There's a single global lock and anything that might affect shared state anywhere grabs it. Python uses global lock for multi-threading. It is reasonable for io thread, which blocks most of

RE: NV Constants

2001-09-30 Thread Hong Zhang
This was failing here until I made the following change: PackFile_Constant_unpack_number(struct PackFile_Constant * self, char * packed, IV packed_size) { char * cursor; NV value; NV * aligned = mem_sys_allocate(sizeof(IV)); Are you sure this is correct? Or this is

RE: NV Constants

2001-09-30 Thread Hong Zhang
The memcpy() can handle alignment nicely. Not always. I tried. :( How that could be possible? The memcpy() just does byte-by-byte copy. It does not care anything about the alignment of source or dest. How can it fail? Hong

RE: thread vs signal

2001-10-01 Thread Hong Zhang
Now how do you go about performing an atomic operation in MT? I understand the desire for reentrance via the exclusive use of local variables, but I'm not quite sure how you can enforce this when many operations are on shared data (manipulating elements of the interpreter / global

RE: thread vs signal

2001-10-01 Thread Hong Zhang
On Sun, Sep 30, 2001 at 10:45:46AM -0700, Hong Zhang wrote: Python uses global lock for multi-threading. It is reasonable for io thread, which blocks most of time. It will completely useless for CPU intensive programs or large SMP machines. It might be useless in theory. In practice

RE: moving integer constants to the constant table

2001-10-04 Thread Hong Zhang
This patch moves integer constants to the constant table if the size chosen for integers is not the same as the size chosen for opcodes. It still leaves room for trouble. I suggestion we move everything that can not be hold by int32_t out of opcode stream. The need for 64-bit constant are

RE: Building on Win32

2001-11-01 Thread Hong Zhang
void gettimeofday(struct timeval* pTv, void *pDummy); { SYSTEMTIME sysTime; FILETIME fileTime;/* 100ns == 1 */ LARGE_INTEGER i; GetSystemTime(sysTime); SystemTimeToFileTime(sysTime, fileTime); /* Documented as the way to get a 64 bit from a FILETIME. */

RE: Beginning of dynamic loading -- platform assistance needed

2001-11-02 Thread Hong Zhang
Okay, here's the updated scheme. *) There is a platform/generic.c and platform/generic.h. (OK, it'll probably really be unixy, but these days it's close enough) If there is no pltform-specific file, this is the one that gets copied to platform.c and platform.h *) If there

RE: Building on Win32

2001-11-02 Thread Hong Zhang
Also, note that Hong Zhang ([EMAIL PROTECTED]) has pointed out a simplification (1 API call rather than 2)... FYI. The GetSystemTimeAsFileTime() takes less than 10 assembly instructions. It just reads the kernel time variable that maps into every address space. and given I think I've found

RE: sizeof(INTVAL), sizeof(void*), sizeof(opcode_t)

2001-11-20 Thread Hong Zhang
On Tue, 20 Nov 2001, Ken Fox wrote: It sounds like you want portable byte code. Is that a goal? I do indeed want portable packfiles, and I thought that was more then a goal, I thought that was a requirement. In an ideal world, I want a PVM to be intergrated in a webbrowser the same way a

thread vs signal

2001-09-28 Thread Hong Zhang
In a word? Badly. :) Especially when threads were involved, though in some ways it was actually better since you were less likely to core perl. Threads and signals generally don't mix well, especially in any sort of cross-platform way. Linux, for example, deals with signals in threaded

RE: thread vs signal

2001-09-28 Thread Hong Zhang
The fun part about async vs sync is there's no common decision on what's an async signal and what's a sync signal. :( SIGPIPE, for example, is one of those. (Tru64, at least, treats it differently than Solaris) I generally divide signals into two groups: *) Messages from outside

RE: SV: Parrot multithreading?

2001-09-28 Thread Hong Zhang
This is fine at the target language level (e.g. perl6, python, jako, whatever), but how do we throw catchable exceptions up through six or eight levels of C code? AFAICS, this is more of why perl5 uses the JMP_BUF stuff - so that XS and functions like sv_setsv() can Perl_croak()

RE: SV: Parrot multithreading?

2001-09-28 Thread Hong Zhang
This is the wrong assumption. If you don't care about the call stack, how can you expect the [sig]longjmp can successfully unwind stack? The caller may have a malloc memory block, Irrelevant with a GC. Are you serious? Do you mean I can not use malloc in my C code? or have entered

RE: [PATCH] Don't count on snprintf

2001-11-30 Thread Hong Zhang
What we really need is our own s(n?)printf: Parrot_sprintf(target, %I + %F - %I, foo, bar, baz); /* or some such nonsense */ or even: target=Parrot_sprintf(%I + %F - %I); /* like Perl's built-in */ That way, it could even handle Parrot strings natively, perhaps

RE: 64-bit Solaris status

2002-01-03 Thread Hong Zhang
I am not sure why we need the U postfix in the first place. For literal like ~0xFFF, the compiler should automatically sign-extends to our expected size. Personally, I prefer to using ([u]intptr_t) ~0xFFF, which is more portable. So we don't have to deal with U, UL, i64. It is possible to use

RE: 64-bit Solaris status

2002-01-03 Thread Hong Zhang
Also, the UL[L] should probably be on the inside of the (): stacklow = '(~0xfffULL)', I still don't see this one is safer than my proposal. ~((uintptr_t) 0xfff); Anyway, we should use some kind of macro for this purpose. #ifndef foo #define foo(a) ((uintptr_t) (a)) #endif or

RE: [PATCH] Re: Question about INTVAL vs. opcode_t sizes

2002-01-06 Thread Hong Zhang
That's what I thought I remembered; in that case, here's a patch: Index: core.ops === RCS file: /home/perlcvs/parrot/core.ops,v retrieving revision 1.68 diff -u -r1.68 core.ops --- core.ops 4 Jan 2002 02:36:25 -

RE: [PATCH] Keep comments in sync with the code...

2002-01-08 Thread Hong Zhang
By the way, we should not have global variable names like index at the first place. All globals should look something like GIndex. Hong -Original Message- From: Simon Glover [mailto:[EMAIL PROTECTED]] Sent: Tuesday, January 08, 2002 9:56 AM To: [EMAIL PROTECTED] Subject: [PATCH]

RE: on parrot strings

2002-01-18 Thread Hong Zhang
(1) There are 5.125 bytes in Unicode, not four. (2) I think the above would suffer from the same problem as one common suggestion, two-level bitmaps (though I think the above would suffer less, being of finer granularity): the problem is that a lot of space is wasted, since the

RE: on parrot strings

2002-01-18 Thread Hong Zhang
preprocessing. Another example, if I want to search for /resume/e, (equivalent matching), the regex engine can normalize the case, fully decompose input string, strip off any combining character, and do 8-bit Hmmm. The above sounds complicated not quite what I had in mind for

RE: on parrot strings

2002-01-18 Thread Hong Zhang
My proposal is we should use mix method. The Unicode standard class, such as \p{IsLu}, can be handled by a standard splitbin table. Please see Java java.lang.Character or Python unicodedata_db.h. I did measurement on it, to handle all unicode category, simple casing, and decimal digit

RE: on parrot strings

2002-01-21 Thread Hong Zhang
But e` and e are different letters man. And re`sume` and resume are different words come to that. If the user wants something that'll match 'em both then the pattern should surely be: /r[ee`]sum[ee`]/ I disagree. The difference between 'e' and 'e`' is similar to 'c' and 'C'. The Unicode

RE: on parrot strings

2002-01-21 Thread Hong Zhang
Yes, that's somewhat problematic. Making up a byte CEF would be Wrong, though, because there is, by definition, no CCS to map, and we would be dangerously close to conflating in CES, too... ACR-CCS-CEF-CES. Read the character model. Understand the character model. Embrace the character

RE: on parrot strings

2002-01-21 Thread Hong Zhang
But e` and e are different letters man. And re`sume` and resume are different words come to that. If the user wants something that'll match 'em both then the pattern should surely be: /r[ee`]sum[ee`]/ I disagree. The difference between 'e' and 'e`' is similar to 'c' and

RE: How Powerful Is Parrot? (A Few More Questions)

2002-01-25 Thread Hong Zhang
I believe the main difficulty comes from heading into uncharted waters. For example, once you've decided to make garbage collection optional, what does the following line of code mean? delete x; If the above code is compiled to Parrot, it probably equivalent to x-~Destructor();

RE: How Powerful Is Parrot? (A Few More Questions)

2002-01-25 Thread Hong Zhang
This changes the way a programmer writes code. A C++ class and function that uses the class looks like this: class A { public: A(){...grab some resources...} ~A(){...release the resources...} } void f() { A a; ... use a's resources ... } ...looks like this

RE: parrot rx engine

2002-01-31 Thread Hong Zhang
But as you say, case folding is expensive. And with this approach you are going to case-fold every string that is matched against an rx that has some part of it that is case-insensitive. That is correct in general. But regex compiler can be smarter than that. For example, rx should optimize

RE: parrot rx engine

2002-02-04 Thread Hong Zhang
Agh, if you go and do that, you must then be sure that rx is capable of optimizing /a/i and /[aA]/ in the same way. What I mean is that Perl's current regex engine is able to use /abc/i as a constant in a string, while it cannot do the same for /[Aa][Bb][Cc]/. Why? Because in the first

RE: I'm amazed - Is this true :)

2002-02-04 Thread Hong Zhang
mops tests : on perl5,python I get - 2.38 M/ops ruby ~ 1.9 M/ops ps ~ 1.5 M/ops parrot - 20.8 M/s parrot jitted - 341 M/ops and it finish in half second ... for most of the other I have to wait more that a minute .. Frankly speaking, this number is misleading. I know the python and

RE: [PATCH] Stop win32 popping up dialogs on segfault

2002-02-08 Thread Hong Zhang
The following patch adds a Parrot_nosegfault() function to win32.c; after it is called, a segmentation fault will print This process received a segmentation violation exception instead of popping up a dialog. I think it might be useful for tinderbox clients. Please notice, stdio is not

RE: 64 bit Debian Linux/PowerPC OK but very noisy

2002-03-16 Thread Hong Zhang
Can you check what is the sizeof(INTVAL) and sizeof(void*)? Some warnings should not have happened. Hong -Original Message- From: Michael G Schwern [mailto:[EMAIL PROTECTED]] Sent: Saturday, March 16, 2002 10:24 AM To: [EMAIL PROTECTED] Subject: 64 bit Debian Linux/PowerPC OK but

RE: Threads afety and interpreter safety

2002-03-16 Thread Hong Zhang
1) NO STATIC VARIABLES! EVER! 2) Don't hold on to pointers to memory across calls to routines that might call the GC. 3) Don't hold on to pointers to allocated PMCs that aren't accessible from the root set I don't think the rule #2 and #3 can be achieved without systematic effort. In

RE: 64 bit Debian Linux/PowerPC OK but very noisy

2002-03-17 Thread Hong Zhang
G Schwern [mailto:[EMAIL PROTECTED]] Sent: Saturday, March 16, 2002 2:54 PM To: Hong Zhang Cc: [EMAIL PROTECTED] Subject: Re: 64 bit Debian Linux/PowerPC OK but very noisy On Sat, Mar 16, 2002 at 02:36:45PM -0800, Hong Zhang wrote: Can you check what is the sizeof(INTVAL) and sizeof

RE: Unicode thoughts...

2002-03-25 Thread Hong Zhang
I think it will be relative easy to deal with different compiler and different operating system. However, ICU does contain some C++ code. It will make life much harder, since current Parrot only assume ANSI C (even a subset of it). Hong This is rather concerning to me. As I understand it,

RE: GC, exceptions, and stuff

2002-05-28 Thread Hong Zhang
Okay, i've thought things over a bit. Here's what we're going to do to deal with infant mortality, exceptions, and suchlike things. Important given: We can *not* use setjmp/longjmp. Period. Not an option--not safe with threads. At this point, having considered the alternatives, I wish

RE: GC, exceptions, and stuff

2002-05-28 Thread Hong Zhang
The thread-package-compatible setjmp/longjmp can be easily implemented using assembly code. It does not require access to any private data structures. Note that Microsoft Windows Structured Exception Handler works well under thread and signal. The assembly code of __try will show you how to

  1   2   >