Re: [fonc] Block-Strings / Heredocs (Re: Magic Ink and Killing Math)

BGB Tue, 13 Mar 2012 00:21:42 -0700

On 3/12/2012 9:01 PM, David Barbour wrote:

On Mon, Mar 12, 2012 at 8:13 PM, Julian Leviston <jul...@leviston.net<mailto:jul...@leviston.net>> wrote:



    On 13/03/2012, at 1:21 PM, BGB wrote:

    although theoretically possible, I wouldn't really trust not
    having the ability to use conventional text editors whenever
    need-be (or mandate use of a particular editor).

    for most things I am using text-based formats, including for
    things like world-maps and 3D models (both are based on arguably
    mutilated versions of other formats: Quake maps and AC3D models).
    the power of text is that, if by some chance someone does need to
    break out a text editor and edit something, the format wont
    hinder them from doing so.



    What is "text"? Do you store your "text" in ASCII, EBCDIC,
    SHIFT-JIS or UTF-8? If it's UTF-8, how do you use an ASCII editor
    to edit the UTF-8 files?

    Just saying' ;-) Hopefully you understand my point.

    You probably won't initially, so hopefully you'll meditate a bit
    on my response without giving a knee-jerk reaction.

I typically work with the ASCII subset of UTF-8 (where ASCII and UTF-8happen to be equivalent).

most of the code is written to assume UTF-8, but languages are designedto not depend on any characters outside the ASCII range (leaving thempurely for comments, and for those few people who consider using themfor identifiers).

EBCDIC and SHIFT-JIS are sufficiently obscure that one can generallypretend that they don't exist (FWIW, I don't generally support codepageseither).

a lot of code also tends to assume Modified UTF-8 (basically, the samevariant of UTF-8 used by the JVM). typically, code will ignore thingslike character normalization or alternative orderings. a lot of codedoesn't particularly know or care what the exact character encoding is.

some amount of code internally uses UTF-16 as well, but this is lesscommon as UTF-16 tends to eat a lot more memory (and, some code justpretends to use UTF-16, when really it is using UTF-8).

Text is more than an arbitrary arcane linear sequence of characters.Its use suggests TRANSPARENCY - that a human could understand thegrammar and content, from a relatively small sample, and effectivelyhand-modify the content to a particular end.
If much of our text consisted of GUIDs:
  {21EC2020-3AEA-1069-A2DD-08002B30309D}
This might as well be
  {BLAHBLAH-BLAH-BLAH-BLAH-BLAHBLAHBLAH}

The structure is clear, but its meaning is quite opaque.


yep.

this is also a goal, and many of my formats are designed to at least tryto be human editable.some number of them are still often hand-edited as well (such as textureinformation files).

That said, structured editors are not incompatible with an underlyingtext format. I think that's really the best option.


yes.

for example, several editors/IDEs have expand/collapse, but still useplaintext for the source-code.

Visual Studio and Notepad++ are examples of this, and a more advancededitor could do better (such as expand/collapse on arbitrary code blocks).

these are also things like auto-completion, ... which are also nifty andwork fine with text.

Regarding multi-line quotes... well, if you aren't fixated on ASCIIyou could always use unicode to find a whole bunch more brackets:

http://www.fileformat.info/info/unicode/block/cjk_symbols_and_punctuation/images.htm
http://www.fileformat.info/info/unicode/block/miscellaneous_technical/images.htm
http://www.fileformat.info/info/unicode/block/miscellaneous_mathematical_symbols_a/images.htm
Probably more than you know what to do with.

AFAIK, the common consensus in much of programmer-land, is that usingUnicode characters as part of the basic syntax of a programming languageborders on evil...



I ended up using:
<[[ ... ]]>
and:
""" ... """ (basically, same syntax as Python).

these seem probably like good enough choices.

currently, the <[[ and ]]> braces are not real tokens, and so will onlybe parsed specially as such in the particular contexts where they areexpected to appear.


so, if one types:
2<[[3, 4], [5, 6]]
the '<' will be parsed as a less-than operator.

but, if one writes instead:
var str=<[[
some text...
more text...
]]>;

it will parse as a multi-line string...

both types of string are handled specially by the parser (rather thanbeing handled by the tokenizer, as are normal strings).



or such...

_______________________________________________
fonc mailing list
fonc@vpri.org
http://vpri.org/mailman/listinfo/fonc

Re: [fonc] Block-Strings / Heredocs (Re: Magic Ink and Killing Math)

Reply via email to