On 3/13/2012 4:37 PM, Julian Leviston wrote:
On 14/03/2012, at 2:11 AM, David Barbour wrote:
On Tue, Mar 13, 2012 at 5:42 AM, Josh Grams <j...@qualdan.com
<mailto:j...@qualdan.com>> wrote:
On 2012-03-13 02:13PM, Julian Leviston wrote:
>What is "text"? Do you store your "text" in ASCII, EBCDIC,
SHIFT-JIS or
>UTF-8? If it's UTF-8, how do you use an ASCII editor to edit
the UTF-8
>files?
>
>Just saying' ;-) Hopefully you understand my point.
>
>You probably won't initially, so hopefully you'll meditate a bit
on my
>response without giving a knee-jerk reaction.
OK, I've thought about it and I still don't get it. I understand
that
there have been a number of different text encodings, but I
thought that
the whole point of Unicode was to provide a future-proof way out
of that
mess. And I could be totally wrong, but I have the impression
that it
has pretty good penetration. I gather that some people who use the
Cyrillic alphabet often use some code page and China and Japan use
SHIFT-JIS or whatever in order to have a more compact representation,
but that even there UTF-8 tools are commonly available.
So I would think that the sensible thing would be to use UTF-8 and
figure that anyone (now or in the future) will have tools which
support
it, and that anyone dedicated enough to go digging into your data
files
will have no trouble at all figuring out what it is.
If that's your point it seems like a pretty minor nitpick. What am I
missing?
Julian's point, AFAICT, is that text is just a class of storage that
requires appropriate viewers and editors, doesn't even describe a
specific standard. Thus, another class that requires appropriate
viewers and editors can work just as well - spreadsheets, tables,
drawings.
You mention `data files`. What is a `file`? Is it not a service
provided by a `file system`? Can we not just as easily hide a storage
format behind a standard service more convenient for ad-hoc views and
analysis (perhaps RDBMS). Why organize into files? Other than
penetration, they don't seem to be especially convenient.
Penetration matters, which is one reason that text and filesystems
matter.
But what else has penetrated? Browsers. Wikis. Web services. It
wouldn't be difficult to support editing of tables, spreadsheets,
drawings, etc. atop a web service platform. We probably have more
freedom today than we've ever had for language design, if we're
willing to stretch just a little bit beyond the traditional
filesystem+text-editor framework.
Regards,
Dave
Perfectly the point, David. A "token/character" in ASCII is equivalent
to a byte. In SHIFT-JIS, it's two, but this doesn't mean you can't
express the equivalent meaning in them (ie by selecting the same
graphemes) - this is called translation) ;-)
this is partly why there are "codepoints".
one can work in terms of codepoints, rather than bytes.
a text editor may internally work in UTF-16, but saves its output in
UTF-8 or similar.
ironically, this is basically what I am planning/doing at the moment.
now, if/how the user will go about typing UTF-16 codepoints, this is not
yet decided.
One of the most profound things for me has been understanding the
ramifications of OMeta. It doesn't "just" parse streams of
"characters" (whatever they are) in fact it doesn't care what the
individual tokens of its parsing stream is. It's concerned merely with
the syntax of its elements (or tokens) - how they combine to form
certain rules - (here I mean "valid patterns of grammar" by rules). If
one considers this well, it has amazing ramifications. OMeta invites
us to see the entire computing world in terms of sets of
problem-oriented-languages, where language is a liberal word that
simply means a pattern of sequence of the constituent elements of a
"thing". To PEG, it basically adds proper translation and true
object-orientism of individual parsing elements. This takes a while to
understand, I think.
Formats here become "languages", protocols are "languages", and so are
any other kind of representation system you care to name (computer
programming languages, processor instruction sets, etc.).
possibly.
I was actually sort of aware of a lot of this already though, but didn't
consider it particularly relevant.
I'm postulating, BGB, that you're perhaps so ingrained in the current
modality and approach to thinking about computers, that you maybe
can't break out of it to see what else might be possible. I think it
was turing, wasn't it, who postulated that his turing machines could
work off ANY symbols... so if that's the case, and your programming
language grammar has a set of symbols, why not use arbitrary (ie not
composed of english letters) ideograms for them? (I think these days
we call these things icons ;-))
You might say "but how will people name their variables" - well
perhaps for those things, you could use english letters, but maybe you
could enforce that no one use more than 30 variables in their code in
any one simple chunk, in which case build them in with the other
ideograms.
I'm not attempting to build any kind of authoritative status here,
merely provoke some different thought in you.
the issue is not that I can't imagine anything different, but rather
that doing anything different would be a hassle with current keyboard
technology:
pretty much anyone can type ASCII characters;
many other people have keyboards (or key-mappings) that can handle
region-specific characters.
however, otherwise, typing unusual characters (those outside their
current keyboard mapping) tends to be a bit more painful, and/or
introduces editor dependencies, and possibly increases the learning
curve (now people have to figure out how these various unorthodox
characters map to the keyboard, ...).
more graphical representations, however, have a secondary drawback:
they can't be manipulated nearly as quickly or as easily as text.
one could be like "drag and drop", but the problem is that drag and drop
is still a fairly slow and painful process (vs, hitting keys on the
keyboard).
yes, there are scenarios where keyboards aren't ideal:
such as on an XBox360 or an Android tablet/phone/... or similar, but
people probably aren't going to be using these for programming anyways,
so it is likely a fairly moot point.
however, even in these cases, it is not clear there are many "clearly
better" options either (on-screen keyboard, or on-screen tile selector,
either way it is likely to be painful...).
simplest answer:
just assume that current text-editor technology is "basically
sufficient" and call it "good enough".
I'll take Dave's point that penetration matters, and at the same time,
most "new ideas" have "old idea" constituents, so you can easily find
some matter for people stuck in the old methodologies and thinking to
relate to when building your "new stuff" ;-)
well, it is like using alternate syntax designs (say, not a C-style
"curly brace" syntax).
one can do so, but is it worth it?
in such a case, the syntax is no longer what most programmers are
familiar or comfortable with, and it is more effort to convert code
to/from the language, ...
so, likely, the overall cheapest option is to use a fairly generic syntax.
most everything else then mostly amounts to various forms of
cost/benefit tradeoff and similar.
it isn't really about "authority" or things being "proper" or similar,
but more about cost-benefit tradeoffs, and trying for the route likely
to result in the most benefit, ...
and, if/when something else catches on, then a person can use that
instead, but if/when this happens is more of an issue for the future to
deal with.
most trends tend to be fairly unexciting and slow moving (for example,
the trends in programming language design and syntax tend to take place
mostly over a period of decades, and their seems to be little evidence
that either ASCII or C-style syntax are likely to go away any time soon).
much past then? well, who knows?...
otherwise:
did mostly go and write a generic in-console text editor, and used an
MS-Edit/QBasic style color scheme (white text on a blue background). may
try for a generally similar aesthetic.
made the observation that "tab" is a rather annoying character to deal
with (some fair amount of logic in the editor interface is spent mostly
working around the behavior of the tab character...). ended up mostly
representing tabs as a "real" tab character, followed by 0 or more
"tab-spacer" characters (these spacer characters aren't intended to be
saved to output, but are mostly for aligning stuff within the editor).
the "eval" key already works (used F5 as the "eval" key).
next up, probably:
probably implementing things like selection and cut/copy/paste, and the
ability to load/save files.
all of this is being kind of long and annoying, but I sort of expected
this much (although I may have underestimated how much would go into
nit-picky stuff related to dealing with user input, which is probably
where the bulk of the effort has gone).
or such...
_______________________________________________
fonc mailing list
fonc@vpri.org
http://vpri.org/mailman/listinfo/fonc