On 3/13/2012 4:37 PM, Julian Leviston wrote:

On 14/03/2012, at 2:11 AM, David Barbour wrote:



On Tue, Mar 13, 2012 at 5:42 AM, Josh Grams <j...@qualdan.com <mailto:j...@qualdan.com>> wrote:

    On 2012-03-13 02:13PM, Julian Leviston wrote:
    >What is "text"? Do you store your "text" in ASCII, EBCDIC,
    SHIFT-JIS or
    >UTF-8?  If it's UTF-8, how do you use an ASCII editor to edit
    the UTF-8
    >files?
    >
    >Just saying' ;-) Hopefully you understand my point.
    >
    >You probably won't initially, so hopefully you'll meditate a bit
    on my
    >response without giving a knee-jerk reaction.

    OK, I've thought about it and I still don't get it.  I understand
    that
    there have been a number of different text encodings, but I
    thought that
    the whole point of Unicode was to provide a future-proof way out
    of that
    mess.  And I could be totally wrong, but I have the impression
    that it
    has pretty good penetration.  I gather that some people who use the
    Cyrillic alphabet often use some code page and China and Japan use
    SHIFT-JIS or whatever in order to have a more compact representation,
    but that even there UTF-8 tools are commonly available.

    So I would think that the sensible thing would be to use UTF-8 and
    figure that anyone (now or in the future) will have tools which
    support
    it, and that anyone dedicated enough to go digging into your data
    files
    will have no trouble at all figuring out what it is.

    If that's your point it seems like a pretty minor nitpick.  What am I
    missing?


Julian's point, AFAICT, is that text is just a class of storage that requires appropriate viewers and editors, doesn't even describe a specific standard. Thus, another class that requires appropriate viewers and editors can work just as well - spreadsheets, tables, drawings.

You mention `data files`. What is a `file`? Is it not a service provided by a `file system`? Can we not just as easily hide a storage format behind a standard service more convenient for ad-hoc views and analysis (perhaps RDBMS). Why organize into files? Other than penetration, they don't seem to be especially convenient.

Penetration matters, which is one reason that text and filesystems matter.

But what else has penetrated? Browsers. Wikis. Web services. It wouldn't be difficult to support editing of tables, spreadsheets, drawings, etc. atop a web service platform. We probably have more freedom today than we've ever had for language design, if we're willing to stretch just a little bit beyond the traditional filesystem+text-editor framework.

Regards,

Dave

Perfectly the point, David. A "token/character" in ASCII is equivalent to a byte. In SHIFT-JIS, it's two, but this doesn't mean you can't express the equivalent meaning in them (ie by selecting the same graphemes) - this is called translation) ;-)

this is partly why there are "codepoints".
one can work in terms of codepoints, rather than bytes.

a text editor may internally work in UTF-16, but saves its output in UTF-8 or similar.
ironically, this is basically what I am planning/doing at the moment.

now, if/how the user will go about typing UTF-16 codepoints, this is not yet decided.


One of the most profound things for me has been understanding the ramifications of OMeta. It doesn't "just" parse streams of "characters" (whatever they are) in fact it doesn't care what the individual tokens of its parsing stream is. It's concerned merely with the syntax of its elements (or tokens) - how they combine to form certain rules - (here I mean "valid patterns of grammar" by rules). If one considers this well, it has amazing ramifications. OMeta invites us to see the entire computing world in terms of sets of problem-oriented-languages, where language is a liberal word that simply means a pattern of sequence of the constituent elements of a "thing". To PEG, it basically adds proper translation and true object-orientism of individual parsing elements. This takes a while to understand, I think.

Formats here become "languages", protocols are "languages", and so are any other kind of representation system you care to name (computer programming languages, processor instruction sets, etc.).

possibly.

I was actually sort of aware of a lot of this already though, but didn't consider it particularly relevant.


I'm postulating, BGB, that you're perhaps so ingrained in the current modality and approach to thinking about computers, that you maybe can't break out of it to see what else might be possible. I think it was turing, wasn't it, who postulated that his turing machines could work off ANY symbols... so if that's the case, and your programming language grammar has a set of symbols, why not use arbitrary (ie not composed of english letters) ideograms for them? (I think these days we call these things icons ;-))

You might say "but how will people name their variables" - well perhaps for those things, you could use english letters, but maybe you could enforce that no one use more than 30 variables in their code in any one simple chunk, in which case build them in with the other ideograms.

I'm not attempting to build any kind of authoritative status here, merely provoke some different thought in you.


the issue is not that I can't imagine anything different, but rather that doing anything different would be a hassle with current keyboard technology:
pretty much anyone can type ASCII characters;
many other people have keyboards (or key-mappings) that can handle region-specific characters.

however, otherwise, typing unusual characters (those outside their current keyboard mapping) tends to be a bit more painful, and/or introduces editor dependencies, and possibly increases the learning curve (now people have to figure out how these various unorthodox characters map to the keyboard, ...).

more graphical representations, however, have a secondary drawback:
they can't be manipulated nearly as quickly or as easily as text.

one could be like "drag and drop", but the problem is that drag and drop is still a fairly slow and painful process (vs, hitting keys on the keyboard).


yes, there are scenarios where keyboards aren't ideal:
such as on an XBox360 or an Android tablet/phone/... or similar, but people probably aren't going to be using these for programming anyways, so it is likely a fairly moot point.

however, even in these cases, it is not clear there are many "clearly better" options either (on-screen keyboard, or on-screen tile selector, either way it is likely to be painful...).


simplest answer:
just assume that current text-editor technology is "basically sufficient" and call it "good enough".


I'll take Dave's point that penetration matters, and at the same time, most "new ideas" have "old idea" constituents, so you can easily find some matter for people stuck in the old methodologies and thinking to relate to when building your "new stuff" ;-)


well, it is like using alternate syntax designs (say, not a C-style "curly brace" syntax).

one can do so, but is it worth it?
in such a case, the syntax is no longer what most programmers are familiar or comfortable with, and it is more effort to convert code to/from the language, ...

so, likely, the overall cheapest option is to use a fairly generic syntax.

most everything else then mostly amounts to various forms of cost/benefit tradeoff and similar.

it isn't really about "authority" or things being "proper" or similar, but more about cost-benefit tradeoffs, and trying for the route likely to result in the most benefit, ...

and, if/when something else catches on, then a person can use that instead, but if/when this happens is more of an issue for the future to deal with.

most trends tend to be fairly unexciting and slow moving (for example, the trends in programming language design and syntax tend to take place mostly over a period of decades, and their seems to be little evidence that either ASCII or C-style syntax are likely to go away any time soon).

much past then? well, who knows?...


otherwise:

did mostly go and write a generic in-console text editor, and used an MS-Edit/QBasic style color scheme (white text on a blue background). may try for a generally similar aesthetic.

made the observation that "tab" is a rather annoying character to deal with (some fair amount of logic in the editor interface is spent mostly working around the behavior of the tab character...). ended up mostly representing tabs as a "real" tab character, followed by 0 or more "tab-spacer" characters (these spacer characters aren't intended to be saved to output, but are mostly for aligning stuff within the editor).

the "eval" key already works (used F5 as the "eval" key).

next up, probably:
probably implementing things like selection and cut/copy/paste, and the ability to load/save files.

all of this is being kind of long and annoying, but I sort of expected this much (although I may have underestimated how much would go into nit-picky stuff related to dealing with user input, which is probably where the bulk of the effort has gone).


or such...

_______________________________________________
fonc mailing list
fonc@vpri.org
http://vpri.org/mailman/listinfo/fonc

Reply via email to