Hi Joel,
I see that (so far) you chose to ignore my invitation in a previous
message, to comment on whether the logic of my argument is acceptable to
you. My question was addressed personally to you and not anonymously to the
list.
I conclude that since you have otherwise demonstrated a friendly and polite
style, and wisely separated disrespectful remarks from any immediate
dialog, you intend to reply to my question, and that you are simply too
busy to do that just now. I hope you will find the time to read this email.
I also note that you have not worn off your keyboard responding to the
example code I provided in support of my position in my previous email. Let
this not suggest that your pre-occupation with concepts is distracting you
from the object you are trying conceptualize, namely the REBOL language
itself, as documented by source code.
I think that your terminology introduces grave mistakes in reflecting on
how REBOL behaves and constitutes a source for confusion.
If I'm not mistaken, the whole point of your approach in this email series
;-) is that you are trying to be able to formulate that in
a: "1234"
b: next a
c: next b
a, b and c reference the same series at different positions. You would like
to invent a terminology that puts in focus the fact that the positions are
different while the data storage is identical, by providing a term that
says "same data storage, different position". For this you choose the term
"series".
1. The term is ill chosen. REBOL already uses series to designate a
pseudotype, something like an abstract datatype, which refers to a
collection of related datatypes, such as strings and blocks. Series as a
pseudotype is spelled with an exclamation mark (series!), but derived from
this spelling, the word series is commonly used in the explanation of
functions, to denote the argument passed to a function, and is commonly
used informally, to refer to a value of type series!. You invite confusion,
when you introduce the word series as a term that means something else.
2. With respect to the example above:
a: "1234"
b: next a
you often refer to 'b as a "series referencing the shared or sharable data
storage". This is incorrect for two reasons.
1. It is incorrect because 'b is a word. 'b is not a series. 'b is not
converted into something other than a word (such as a series) and referring
to 'b as a series, such as in "the b series references the shared data
storage", is incorrect. It describes the word 'b in terms that may mislead
a newbie to think of 'b as being something different from what it actually
is, namely a word that references some value. A terminology designed to
conceptualize REBOL should not invent a new REBOL, which is inconsistent
with the REBOL language implemented in the REBOL interpreter.
Whenever REBOL evaluates 'b, it always recognizes 'b as a word and operates
- not on 'b - but on the value referenced by 'b. That is an important
distinction. This distinction is one of the factors that constitute REBOL's
simplicity, and introducing a terminology that ignores that is the basis
for a mental model of REBOL that is more complicated than the object
(REBOL) being modelled mentally.
Explanations should not be more complicated than the object they are
explaining.
3. Another REBOL feature that makes REBOL such an elegant and simple
language is the fact that different constructs such as blocks, strings,
objects, hidden contexts and functions are essentially similar, with a very
small number of distinguishing features. I expect a terminology that wishes
to conceptualize REBOL to parallel REBOL in this respect and provide for a
vocabulary that expresses the common elements of these different constructs
in the same terms and provides for the bearest minimum of additional words
that permit to distinguish between different constructs, a function and a
series, for instance.
The words required to express the similarity of features are word, value,
datatype, reference, local, global, binding and context. (Have I overlooked
something?) The words necessary to distinguish between the different
constructs are indefinite extant, garbage collection and literal. I believe
that this is a pretty much comprehensive list of the words required to
describe the behavior of all of REBOL's language constructs. Specifically,
the similarities and differences between a series, a hidden context, an
object and a function can be expressed in the terms presented in this list.
Since REBOL is simple, so should the language used to discuss it.
I will now comment on the second email you addresses to me. Please be
warned that portion of my response may be somewhat tedious:
you wrote:
>In THEORY, a series is a data storage and a "current position"
>within that storage.
Let me comment on
>a series is a data storage and a "current position"
>within that storage.
You subtly modified my definition and that modification changes something
important. A series IS a data storage that HAS a current position. The
important thing here is that the current position is a property of a
series. I emphasize property because I want you to think that it is
something that changes without effecting the identity of the series. The
series remains the same series, independent of the value of the current
position. The current position is a local attribute of the series.
>In THEORY
I was hoping that the intention of providing a terminology for describing
REBOL's features, so as to make REBOL conceptually accessable, would
coincide with an interest in THEORY.
You wrote:
>
>In IMPLEMENTATION,
Aha, now for some real expert talk.
1. Should we be discussing IMPLEMENTATION?
I think that when we develop a terminology designed to describe REBOL
conceptually, we must by force of our undertaking do without any
assumptions about how REBOL is implemented. A REBOL programmer should be
able to program in REBOL without knowing anything about how REBOL is
implemented, and a conceptual description of REBOL, which you originally
set out to develop, must therefore disregard any details that have to do
with how REBOL is implemented.
A conceptual description of REBOL should exlusively rely on REBOL's
documented language features and its observable behavior.
2. Can we discuss IMPLEMENTATION?
Sorry, I don't have the REBOL source code and I have no idea how REBOL is
implemented. If you had access to the REBOL source code, you wouldn't be
discussing it publicly on this mailing list, Carl forbid. Therefore your
statements about implementation must be speculative. Being speculations,
they are as hypothetical as the position I presented is theoretical. Under
the title IMPLEMENTATION we are replacing a theory concerned with how REBOL
operates, by a hypothesis about how REBOL is implemented. Is that to be
considered progress with respect to our project of defining a terminology
that will serve as a basis for a conceptual understanding of REBOL?
>a series has a REFERENCE to a sharable data storage
>and has a private/nonshared "current position" within that storage.
Maybe, maybe not. It depends on what you mean by series, reference and "a
sharable data storage". Series in REBOL are pseudo-types. Are you referring
to series in the sense of pseudo-types?
References in REBOL are somewhat different from references in C. Are we
talking about C references or REBOL references?
What does it mean to be "sharable"? Does it mean that the processor allows
different programs to share that memory region? Does it mean that the
operating system provides mechanism for different programs to share that
memory region? Is it a feature provided by the language REBOL is
implemented in (we are talking about IMPLEMENTATION, or are we)? Or have I
overlooked some REBOL function or datatype that provides a sharable data
storage?
And what is data storage? Is it a byte in memory? Or is it some specialized
data structure? Does REBOL's source code include a data structure called
data_storage? Do you know that? I don't. Is it any of my business? If you
are talking about IMPLEMENTATION it is.
3. Are we talking about IMPLEMENTATION?
you wrote:
>Don't we agree that given:
>
> a: next "123456"
> b: next a
>
>both 'a and 'b refer to the same string, but to different positions
>within that string?
Yes, provided we agree that a string is specialization of series, a series
that specializes on sequences of characters and a small set of formatting
instructions. A string then is a specialized series, a container for
characters that includes a current position index. Being the specialization
of a series, a string HAS the property of a current position index. Is that
what you mean?
>Isn't it valid to say that 'a and 'b are not the
>same series, but that each is a series referring to the same string
No. 'a and 'b are not series, they are words that refer to a string!
datatype, which is one of the possible concrete implementations of the
series pseudotype.
>(or whatever we want to call the data storage in this example)?
The term string is not one of many ways of contrasting a nacked "data
storage" to a series. String is a specialization of a series. Since a
series is a data storage that has a current position index, a string is a
data storage that has a current position index as well.
A string is not the data minus the position index.
>Isn't that less ambiguous than saying that they are the same series?
And what on earth is ambiguous about saying that 'a and 'b reference the
same string at different positions? And if you want to abstract from the
specific types of data stored in a string, you can also speak generically
of series. Then you can say that 'a and 'b reference the same series at
different positions. What is ambiguous about that?
Ambiguity is not the problem here. What is a problem is that in order to
maintain your terminology, you refer to 'a and 'b as series, which they are
not, and you present strings as though they didn't have a position index,
even though they do.
It does not add to the clarity, when for the sake of terminology you end up
making statements about REBOL's datatypes that do not coincide with the
observable behavior of these datatypes.
>
>>
>> 3. Both the data stored by the series as well as the series' current
>> position can be controlled by using REBOL functions.
>>
>
>Some REBOL operations affect only the current position of a series,
>and therefore have no effect on any other series referring to the
>same data storage.
>
I address that under 3.
>Other REBOL operations affect the data storage referred to by a
>series, and may therefore cause side-effects on any other series
>values which refer to the same data storage.
>
I address that under 4.
>>
>> 3. Modifying the current position does not modify the data.
>>
>
>Agreed. Affecting the current position of a particular series
>has no effect on the data storage referred to by that series
Your insisting on referring to the word as a series, so that now the series
is doing the referencing, does not add any clarity to what was stated under 3.
>because the current position is not part of the data storage,
>but belongs to that series alone (whether or not other series
>values refer to the same data storage).
This is wrong. Modifying the current position does not modify the data
because the current position is a distinct property of the series, or the
string in your example. There are no multiple series referencing the same
data storage. The series is the data storage.
>
>>
>> 4. When the data is modified, the modifications always begin at the current
>> position of the series. (One could complain and refer to 'append, which
>> inserts stuff at the tail of a series, and not at its current position. But
>> I would refer to append's source, which uses tail to position the series
>> immediately behind its last element and then uses insert to insert a value
>> at that position. So, append itself modifies the current position of the
>> series in order to achieve its purpose. Therefore append proves this point
>> and doesn't contradict it.)
>> 5. Modifying the data may modify the current position (insert, remove), but
>> does not have to (replace) and occassionally does (append).
>>
>
>Can we agree to finesse the entire 'append side discussion by simply
>saying
>
> Modifying the data storage referred to by a series takes the
> current position of that series into account, and may change
> that current position.
No, a data storage is not referred to by a series. The series is the data
storage. Therefore we can say:
"Modifying the data in a series takes the current position of the series
into account and may change that current position." Examples:
>> insert next "1234" " abc "
== "234"
>> insert next [1 2 3 4] 'abc
== [2 3 4]
>It will not change the current position
> of any other series referring to the same data storage.
There is no other series referring to the same data storage!
>
>For completeness, I would then add
>
> However, the change to the shared data storage may render invalid
> the current position of another series referring to the same data
> storage; that fact will not be detected
> [by the current implementation]
> until a subsequent attempt is made to access the data storage
> through a series with a current position which is out of bounds
> relative to the modified data storage.
>
The current implementation of REBOL contains a bug that permits words to
reference a series at a position which is no longer valid, due to
modifications made to the series' data. An attempt to reference a series at
a position that has become invalid may result in an error.
>>
>> 6. The current position of a series is local.
>> 7. The data of a series is global.
>>
>
>My computing science background endows "local" and "global" with
>a fair bit of baggage. Could we agree on "private/public" or
>"sharable/non-sharable"?
>
Not really. Not meaning to be disrespectful, but your computer science
background is not a strong reason to disregard the vocabulary provided by
REBOL's documentation and used in REBOL programming, when trying to discuss
REBOL. The vocabulary is well suited for expressing what goes on in REBOL.
Maybe you are not comfortable using these terms. Or perhaps you have not
understood their meaning for REBOL? You should be aware that neither
"private/public" nor "sharable/non-sharable" convey the meaning that local
and global have in REBOL.
Let me explain:
A series is a pseudo-type. It is a generic way of talking about specialized
container values such as blocks or strings. Blocks and strings are series
values.
When we speak of a series, we speak of properties that are shared by series
values, such as blocks or strings. REBOL only provides for series as a
pseudotype that allows you to refer to the different datatypes represented
by a series collectively. So the word series is already in use and your
redefining series as something else does not clarify anything, it adds a
new source of confusion.
A series is a data container and it has a property. The property is the
series' position index. As a data container it stores data. A literal
series value is global. Therefore a literal series value is exempt from
garbage collection. The series' position index is local. Therefore
modifications made to the series' position index are not permanent.
a: [ "1 2 3 4" ]
Here, we have an anonymous string that is not referenced by a word. Since
we are referencing the block that contains the string with the word 'a, we
can retrieve the string anytime we want to via the word 'a and the block it
references.
>> a: [ "1234" ]
== ["1234"]
>>
Let's retrieve the string:
>> first a
== "1234"
Now, let's modify the string's position index:
>> next first a
== "234"
Have we modified the string permanently?
>> first a
== "1234"
No.
BTW, modifying the position index of the string is of course something
quite different from modifying the position index of the block, which is
referenced by 'a:
>> next a
== []
Note that in the email I was responding to, you claimed that what was being
modified by the function next was something "specific to 'a". You had
referred to the string as a case III), anonymous entity, which we couldn't
access. I announced that I would demonstrate two things:
a) we can access the anonymous entity, without referring to it by a word,
b) we can manipulate this entities position index and therefore it is not
correct to propose that the position index is something specific to the
word refererencing the string.
I just demonstrated both things. BTW, my original email you were commenting
on, included similar code. For you to make a statement of this nature: "the
position index is specific to 'a" shows that you i) either ignored the code
I provided, or ii) are not aware of the significance of the code I provided.
Now let's add something to the string at the second position:
>> insert next first a " def "
== " 2 3 4"
>> a
== ["1 def 2 3 4"]
The modification of the position index was transitory, the string's current
position index continues to have the value 1, the insertion of the second
string, however, was permanent. That is because the string is a literal
string, and therefore it global.
However, since we inserted the second string at the position determined by
the word 'next, the location at which the second string " def " was
inserted demonstrates that the strings position index was indeed modified
at the time 'next was applied to the string and the string - without being
referenced by a word 'a or otherwise, has a position index which was
incremented by 'next and resulted in the second string being inserted in
the second position.
I have been saying that literal strings are global. What types of strings
are not global?
Compare the behavior of the strings in these two blocks:
These are the blocks:
a: [insert next "1234" " abc "]
b: [insert next make string! "1234" " abc "]
Here's the behavior
>> a: [insert next "1234" " abc "]
== [insert next "1234" " abc "]
>> reduce a
== ["234"]
>> a
== [insert next "1 abc 234" " abc "]
The string which was passed as an argument to next, "1234" and then passed
from next to insert, when the block 'a was reduced, now contains the
substring " abc ". That is because the string is a literal string, and as
such it is global, it retains its changes.
>> b: [insert next make string! "1234" " abc "]
== [insert next make string! "1234" " abc "]
>> reduce b
== ["234"]
>> b
== [insert next make string! "1234" " abc "]
Here reduce led to the construction of a string (make string!) that is not
literal. Therefore the string is not global and - even though the same
expressions were evaluated leading to the same modifications of that local
string (local to the block b) the local string including its modifications,
are not retained. We can see that a string had been created and was
modified, if we use head and first to inspect the complete local string:
>> head first reduce b
== "1 abc 234"
The string documented here is local to the block, it is not literal, not
global and therefore disappears from the block as soon as the exression
reduce a has completed executing.
So far, I have demonstrated that strings have position indexes, strings are
not just data storage, and position indexes are not something that are
added to a string by virtue of an additional entity, which you term a
"series".
Now, here comes something that - I believe - has been bothering you about
my explanations: How do I explain that I can assign as many words as I want
to the string contained in a's block, each with a different position index,
if, as is my claim, the position index is the property of the series, i.e.
a string or a block, for instance, itself, and just as there exists only
one data storage, there exists only one series? (Of course, I'm only
guessing that its bothering you, since you have not commented on my position.)
Using the following version of 'a
>> a: [ "1234" ]
after we modified it
>> insert next first a " def "
== " 2 3 4"
>> a
== ["1 def 2 3 4"]
we now assign the following words:
b: first a
c: next first a
d: next next first a
>> b
== "1 def 2 3 4"
>> d
== "def 2 3 4"
If all three words 'b ... 'd are referencing one and the same series(!),
and the series has a position index, how do I account for the fact that
each word references the series at a different position index? How can one
position index, that of the series, have multiple values, simultaneously?
Don't we need a new term - lets call it a series, or a view - which owns
its private position index, while it shares the data storage with the other
words, by referencing the string? No we don't need to introduce a new term.
The explanation follows:
The position index is local to the series. When I reference a local word by
another, it is indefinitely extended. (Indefinitely means as long as there
exists a symbol which continues to reference the local word directly, or
indirectly, by means of a construct that in turns references the local word).
When I say.
b: first a
I have extended the position index by referencing it with the word 'b. Now
'b references the series with the series' position index set to 1.
c: next first a
c references the series with the position index being the result of
applying next to the string embedded in a's block made permanent by virtue
of indefinite extent.
In conclusion, a series is like a hidden context, a function, or an object
whose local word current-position-index has been indefinitely extended by
being referenced by a word, while its data word is global like a literal
string. A series is different from a hidden context, a function, or an
object, in that the indefinitely extended local word,
current-position-index, is bound to the context of the referencing word,
'a, or 'b and so on, rather then being bound to the series' context. Would
the local word current-position-index remain bound to the context of the
series, the value of all words referencing the series would always evaluate
to that one value of the series' position index. That is not how REBOL
behaves. Each word mare reference the series at a different position,
because the current-position-index is bound in the word's context. And,
again, in this respect, series are different from functions, objects, and
hidden contexts.
I believe I have demonstrated that the terminology I listed above, and
which is commonly used in the REBOL documentation, is sufficient to express
what a series is. I have also demonstrated that this terminology has the
advantage that a series is described in terms that reflect the similarity
between different REBOL constructs, while it allows us to specify how a
series is distinct from the other constructs.
Hope this helps,
Elan