Re: [raw-strings] Indentation problem

Brian Goetz Mon, 05 Feb 2018 07:55:45 -0800

Sorry for the delay getting back to this.

Hello!


Every language which implements the multiline strings has problems
with indentation.

Indeed. The fundamental problem here is that the indentation ofembedded snippets is serving two masters; the nesting of the surroundingcode, and the snippet itself. Sometimes the user cares about one;sometimes the other, and there's no one-size-fits-all set of rules thatany language has come up with that doesn't make both camps happy.

Sometimes it doesn't really matter; a few extra spaces in an HTMLdocument or SQL query is often an acceptable price to pay forclean-looking code. But sometimes it does matter. Which raises twoquestions:

 - What should programmers do?
 - What should the language help them do?

E.g. consider something like this:

So, in light of the above questions, let's ask: is this the right way togenerate a HTML document? It not only has "holes" to be filled in, butit has entire sections whose presence or absence depends on state. Ithink the mess of this example goes far deeper than indentation. (Butyes, people will write code like this, with whatever tools we givethem.) To the second question, what should the language do to help thiscode? Some would say "of course, the problem is you don't supportinterpolation." But as this example shows, interpolation only helpswith the trivial bits; it doesn't help with the conditional inclusion,so it only gets you a small part of the way to this example. For that,you either need something with more structure, or a templating engine,or a builder, or one of the zillion other tools we've invented for thissort of thing.

So, without ignoring your fundamental question about indentation, I'lljust point out that this example is about way more than indentation, andmove on ...

Now we have broken formatting in the generated HTML, which ruins the
idea of multiline strings

I think "multiline strings" (or even "raw strings") are a bit of amisleading name. What we're going for here is the ability to embed anarbitrary snippet of a "program" (shell script, SQL query, JSON doc) ina Java program, without having to mangle the embedded snippet. Thisenhances readability (not mucked up with escapes and extra quotes) andreduces errors (because you can just cut and paste that snippet ofscript from the editor in which you've probably already written it,without risking breaking it via syntactic mangling.) But, as you say,there are issues with indentation, when it matters. (Surely it mattersfor snippets of python.)

Secondarily, the design center for this feature is: _short_ snippets --those for which putting them in a separate document would beobfuscatory. To see this, we have to approach it from both sides. Onthe short side, imagine Java didn't have string literals at all. Havingto read "yes" and "no" out of a file would be ridiculously obfuscatory;eliminating this indirection makes code easier to read and lesserror-prone. But on the long side, using raw strings to embed amillion-line snippet in a Java program is also ridiculous; it would befar easier for maintainers of both the Java part and the embedded partto have their own uniform artifacts to maintain. So the sweet spot forthis feature is somewhere in the middle -- snippets that are shortenough that indirecting to a file impairs readability, but not so longthat there's any question where the embedded snippet ends and the Javacode resumes. (Subjectively, I'd say that this sweet spot is in the5-10 line range.)

(why bother to generate \n in output HTML if
it looks like a mess anyways?) Moreover, the structure of Java program
now affects the output. E.g. if you add several more nested "if" or
"switch" statement, you will need to indent <p> even more.

My answer to those people is: then don't do that ;) They're alreadywell outside the design center (as outlined above). They should beusing a templating mechanism, a builder, or something else to decouplethe static content from the dynamic content. Of course, they will, butI'm not sure bending over backwards to accomodate them is the winning move.

Many languages provide library methods to handle this.

Good, now we're back to indentation. All things being equal, it isbetter to do things in libraries than in the language; it is cheaper,more flexible, faster to market, less risky, and can support a broaderrange of preferences (you can have different libraries for differentpreferences.) So I like this direction.

E.g.
trimIndent() could be provided to remove leading spaces of every line,
but this would kill the HTML indents at all. Another possibility is to
provide a method like trimMargin() on Kotlin [1] which trims all
spaces before a special character (pipe by default) including a
special character itself.

Now that we're in library world, we can have _all_ of these. We cantrim indents to the first indent, or trim a specified number of spacesoff, or trim to a user-selected marker. And if the users don't like theones we include, they can write their own.

This is almost nice. Even without syntax highlighting you can easily
distinguish between Java code and injected HTML code, you can indent
Java and HTML independently and HTML code does not clash with Java
code structure.

Pushing this to a library gives users the option, but not the obligationto do this. That's good.

The only problem is the necesity to call the
trimMargin() method.

For some meaning of "only" :) Like most syntactic conventions, someusers will say "this is great" and others will say "yuck". I prefer thesemantic transparency of calling a method that has a clear specification-- especially when there are multiple possible options.

Remember that we're already in a corner case with respect to indentation-- in many cases, the users don't care at all about the extra spaces,they're just building up a SQL query that is going to be sent to adatabase, and the database doesn't care either.

This means that original line is preserved in the
bytecode and during runtime and the trimming is processed every time
the method is called causing performance and memory handicap. This
problem could be minimized making trimMargin() a javac intrinsic.

There are multiple layers at which this can be optimized (the JIT may beable to observe that this a pure function applied to a constant), butindeed, this is a great candidate for compile-time constant folding. (You can even see experiments related to compile-time constant foldinggoing on in the condy-folding branch of the amber repo.) Note too thatwe're now in corner-case-of-corner-case territory -- those who careabout the indentation and the cost of runtime string processing.

Hoever even in this case it would be hard to enforce usage of this
method and I expect that tons of hard-to-read Java code will appear in
the wild, despite I believe that Java is about readability.

Developers ability to combine simple features to produce unreadable codefar outstrips the ability of language designers to do anything about it ...

So I propose to enforce such (or similar) format on language level
instead of adding a library method like "trimMargin()".

I think this would be a language design mistake. This is taking onearbitrary convention and burning it into the language. That conventionmight be fine for some situations, but terrible for others; not onlymight it not be the most readable choice in all cases, but it could bean actual conflict -- what if the | character is meaningful in theembedded language, such as Markdown tables? Now we're back to escaping-- which we were trying to avoid.

The language shouldn't pick favorites here; it should provide a simple,clear mechanism, which can be usefully composed with other mechanisms toget the job done. Polluting the language to avoid the method call is abad trade.

I see some advantages with such syntax:
1. You can comment (or comment out!) a part of multiline string
without terminating it

Rather than framing this as a property of a proposed solution, let'sframe it as a question. What should be the interaction with comments ina raw string? Should you be able to embed comments? Should you be ableto comment lines out? (Note that many languages support comments, so itmay be possible to do this by embedding a comment, rather than using theJava-level commenting.) While I can surely see the utility ofinteraction with commenting, I also think that these "requirements" areonly in play when the string in question is too long in the first place.

2. Looking into code fragment out of context (e.g. diff log) you
understand that you are inside a multiline literal.
reviewing a diff like

             | x++;
+           | if (x == 10) break;
             | foo(x);

Without pipes you could think that it's Java code without any further
consideration.

This is true, but this is also true of large block comments; you can'ttell whether the added line is part of a commented out block or ofexecutable code.

Again, with raw strings, this is more of a problem when used withtoo-long blocks.

So, there are two things I don't like about this proposal: it's too"opinionated", and at the same time, it loses the fundamental goal wewere trying to get to -- not having to muck up an embedded block withescaping. (Sure, IDEs could (and should) help on pasting here, but thatonly helps writing, not reading.)

The only disadvantage I see in forcing a pipe prefix is inability to
just paste a big snippet from somewhere to the middle of Java program
in a plain text editor.

As mentioned, we think this is most of the point, so this is a prettybig disadvantage indeed.

Re: [raw-strings] Indentation problem

Reply via email to