On 3/12/2012 9:01 PM, David Barbour wrote:

On Mon, Mar 12, 2012 at 8:13 PM, Julian Leviston <jul...@leviston.net <mailto:jul...@leviston.net>> wrote:


    On 13/03/2012, at 1:21 PM, BGB wrote:

    although theoretically possible, I wouldn't really trust not
    having the ability to use conventional text editors whenever
    need-be (or mandate use of a particular editor).

    for most things I am using text-based formats, including for
    things like world-maps and 3D models (both are based on arguably
    mutilated versions of other formats: Quake maps and AC3D models).
    the power of text is that, if by some chance someone does need to
    break out a text editor and edit something, the format wont
    hinder them from doing so.


    What is "text"? Do you store your "text" in ASCII, EBCDIC,
    SHIFT-JIS or UTF-8? If it's UTF-8, how do you use an ASCII editor
    to edit the UTF-8 files?

    Just saying' ;-) Hopefully you understand my point.

    You probably won't initially, so hopefully you'll meditate a bit
    on my response without giving a knee-jerk reaction.



I typically work with the ASCII subset of UTF-8 (where ASCII and UTF-8 happen to be equivalent).

most of the code is written to assume UTF-8, but languages are designed to not depend on any characters outside the ASCII range (leaving them purely for comments, and for those few people who consider using them for identifiers).

EBCDIC and SHIFT-JIS are sufficiently obscure that one can generally pretend that they don't exist (FWIW, I don't generally support codepages either).

a lot of code also tends to assume Modified UTF-8 (basically, the same variant of UTF-8 used by the JVM). typically, code will ignore things like character normalization or alternative orderings. a lot of code doesn't particularly know or care what the exact character encoding is.

some amount of code internally uses UTF-16 as well, but this is less common as UTF-16 tends to eat a lot more memory (and, some code just pretends to use UTF-16, when really it is using UTF-8).



Text is more than an arbitrary arcane linear sequence of characters. Its use suggests TRANSPARENCY - that a human could understand the grammar and content, from a relatively small sample, and effectively hand-modify the content to a particular end.

If much of our text consisted of GUIDs:
  {21EC2020-3AEA-1069-A2DD-08002B30309D}
This might as well be
  {BLAHBLAH-BLAH-BLAH-BLAH-BLAHBLAHBLAH}

The structure is clear, but its meaning is quite opaque.


yep.

this is also a goal, and many of my formats are designed to at least try to be human editable. some number of them are still often hand-edited as well (such as texture information files).


That said, structured editors are not incompatible with an underlying text format. I think that's really the best option.

yes.

for example, several editors/IDEs have expand/collapse, but still use plaintext for the source-code.

Visual Studio and Notepad++ are examples of this, and a more advanced editor could do better (such as expand/collapse on arbitrary code blocks).

these are also things like auto-completion, ... which are also nifty and work fine with text.


Regarding multi-line quotes... well, if you aren't fixated on ASCII you could always use unicode to find a whole bunch more brackets:
http://www.fileformat.info/info/unicode/block/cjk_symbols_and_punctuation/images.htm
http://www.fileformat.info/info/unicode/block/miscellaneous_technical/images.htm
http://www.fileformat.info/info/unicode/block/miscellaneous_mathematical_symbols_a/images.htm
Probably more than you know what to do with.


AFAIK, the common consensus in much of programmer-land, is that using Unicode characters as part of the basic syntax of a programming language borders on evil...


I ended up using:
<[[ ... ]]>
and:
""" ... """ (basically, same syntax as Python).

these seem probably like good enough choices.

currently, the <[[ and ]]> braces are not real tokens, and so will only be parsed specially as such in the particular contexts where they are expected to appear.

so, if one types:
2<[[3, 4], [5, 6]]
the '<' will be parsed as a less-than operator.

but, if one writes instead:
var str=<[[
some text...
more text...
]]>;

it will parse as a multi-line string...

both types of string are handled specially by the parser (rather than being handled by the tokenizer, as are normal strings).


or such...

_______________________________________________
fonc mailing list
fonc@vpri.org
http://vpri.org/mailman/listinfo/fonc

Reply via email to