On 14/03/2012, at 2:11 AM, David Barbour wrote:

> 
> 
> On Tue, Mar 13, 2012 at 5:42 AM, Josh Grams <j...@qualdan.com> wrote:
> On 2012-03-13 02:13PM, Julian Leviston wrote:
> >What is "text"? Do you store your "text" in ASCII, EBCDIC, SHIFT-JIS or
> >UTF-8?  If it's UTF-8, how do you use an ASCII editor to edit the UTF-8
> >files?
> >
> >Just saying' ;-) Hopefully you understand my point.
> >
> >You probably won't initially, so hopefully you'll meditate a bit on my
> >response without giving a knee-jerk reaction.
> 
> OK, I've thought about it and I still don't get it.  I understand that
> there have been a number of different text encodings, but I thought that
> the whole point of Unicode was to provide a future-proof way out of that
> mess.  And I could be totally wrong, but I have the impression that it
> has pretty good penetration.  I gather that some people who use the
> Cyrillic alphabet often use some code page and China and Japan use
> SHIFT-JIS or whatever in order to have a more compact representation,
> but that even there UTF-8 tools are commonly available.
> 
> So I would think that the sensible thing would be to use UTF-8 and
> figure that anyone (now or in the future) will have tools which support
> it, and that anyone dedicated enough to go digging into your data files
> will have no trouble at all figuring out what it is.
> 
> If that's your point it seems like a pretty minor nitpick.  What am I
> missing?
> 
> Julian's point, AFAICT, is that text is just a class of storage that requires 
> appropriate viewers and editors, doesn't even describe a specific standard. 
> Thus, another class that requires appropriate viewers and editors can work 
> just as well - spreadsheets, tables, drawings. 
> 
> You mention `data files`. What is a `file`? Is it not a service provided by a 
> `file system`? Can we not just as easily hide a storage format behind a 
> standard service more convenient for ad-hoc views and analysis (perhaps 
> RDBMS). Why organize into files? Other than penetration, they don't seem to 
> be especially convenient.
> 
> Penetration matters, which is one reason that text and filesystems matter.  
> 
> But what else has penetrated? Browsers. Wikis. Web services. It wouldn't be 
> difficult to support editing of tables, spreadsheets, drawings, etc. atop a 
> web service platform. We probably have more freedom today than we've ever had 
> for language design, if we're willing to stretch just a little bit beyond the 
> traditional filesystem+text-editor framework. 
> 
> Regards,
> 
> Dave

Perfectly the point, David. A "token/character" in ASCII is equivalent to a 
byte. In SHIFT-JIS, it's two, but this doesn't mean you can't express the 
equivalent meaning in them (ie by selecting the same graphemes) - this is 
called translation) ;-)

One of the most profound things for me has been understanding the ramifications 
of OMeta. It doesn't "just" parse streams of "characters" (whatever they are) 
in fact it doesn't care what the individual tokens of its parsing stream is. 
It's concerned merely with the syntax of its elements (or tokens) - how they 
combine to form certain rules - (here I mean "valid patterns of grammar" by 
rules). If one considers this well, it has amazing ramifications. OMeta invites 
us to see the entire computing world in terms of sets of 
problem-oriented-languages, where language is a liberal word that simply means 
a pattern of sequence of the constituent elements of a "thing". To PEG, it 
basically adds proper translation and true object-orientism of individual 
parsing elements. This takes a while to understand, I think.

Formats here become "languages", protocols are "languages", and so are any 
other kind of representation system you care to name (computer programming 
languages, processor instruction sets, etc.).

I'm postulating, BGB, that you're perhaps so ingrained in the current modality 
and approach to thinking about computers, that you maybe can't break out of it 
to see what else might be possible. I think it was turing, wasn't it, who 
postulated that his turing machines could work off ANY symbols... so if that's 
the case, and your programming language grammar has a set of symbols, why not 
use arbitrary (ie not composed of english letters) ideograms for them? (I think 
these days we call these things icons ;-))

You might say "but how will people name their variables" - well perhaps for 
those things, you could use english letters, but maybe you could enforce that 
no one use more than 30 variables in their code in any one simple chunk, in 
which case build them in with the other ideograms.

I'm not attempting to build any kind of authoritative status here, merely 
provoke some different thought in you.

I'll take Dave's point that penetration matters, and at the same time, most 
"new ideas" have "old idea" constituents, so you can easily find some matter 
for people stuck in the old methodologies and thinking to relate to when 
building your "new stuff" ;-)

Regards,
Julian
_______________________________________________
fonc mailing list
fonc@vpri.org
http://vpri.org/mailman/listinfo/fonc

Reply via email to