Sorry for the delay getting back to this.

Hello!

Every language which implements the multiline strings has problems
with indentation.

Indeed.  The fundamental problem here is that the indentation of embedded snippets is serving two masters; the nesting of the surrounding code, and the snippet itself.  Sometimes the user cares about one; sometimes the other, and there's no one-size-fits-all set of rules that any language has come up with that doesn't make both camps happy.

Sometimes it doesn't really matter; a few extra spaces in an HTML document or SQL query is often an acceptable price to pay for clean-looking code.  But sometimes it does matter.  Which raises two questions:
 - What should programmers do?
 - What should the language help them do?

E.g. consider something like this:

So, in light of the above questions, let's ask: is this the right way to generate a HTML document?  It not only has "holes" to be filled in, but it has entire sections whose presence or absence depends on state.  I think the mess of this example goes far deeper than indentation.  (But yes, people will write code like this, with whatever tools we give them.)  To the second question, what should the language do to help this code?  Some would say "of course, the problem is you don't support interpolation."  But as this example shows, interpolation only helps with the trivial bits; it doesn't help with the conditional inclusion, so it only gets you a small part of the way to this example.  For that, you either need something with more structure, or a templating engine, or a builder, or one of the zillion other tools we've invented for this sort of thing.

So, without ignoring your fundamental question about indentation, I'll just point out that this example is about way more than indentation, and move on ...

Now we have broken formatting in the generated HTML, which ruins the
idea of multiline strings

I think "multiline strings" (or even "raw strings") are a bit of a misleading name.  What we're going for here is the ability to embed an arbitrary snippet of a "program" (shell script, SQL query, JSON doc) in a Java program, without having to mangle the embedded snippet.  This enhances readability (not mucked up with escapes and extra quotes) and reduces errors (because you can just cut and paste that snippet of script from the editor in which you've probably already written it, without risking breaking it via syntactic mangling.)  But, as you say, there are issues with indentation, when it matters.  (Surely it matters for snippets of python.)

Secondarily, the design center for this feature is: _short_ snippets -- those for which putting them in a separate document would be obfuscatory.  To see this, we have to approach it from both sides. On the short side, imagine Java didn't have string literals at all. Having to read "yes" and "no" out of a file would be ridiculously obfuscatory; eliminating this indirection makes code easier to read and less error-prone.  But on the long side, using raw strings to embed a million-line snippet in a Java program is also ridiculous; it would be far easier for maintainers of both the Java part and the embedded part to have their own uniform artifacts to maintain.  So the sweet spot for this feature is somewhere in the middle -- snippets that are short enough that indirecting to a file impairs readability, but not so long that there's any question where the embedded snippet ends and the Java code resumes.  (Subjectively, I'd say that this sweet spot is in the 5-10 line range.)

(why bother to generate \n in output HTML if
it looks like a mess anyways?) Moreover, the structure of Java program
now affects the output. E.g. if you add several more nested "if" or
"switch" statement, you will need to indent <p> even more.

My answer to those people is: then don't do that ;)  They're already well outside the design center (as outlined above).  They should be using a templating mechanism, a builder, or something else to decouple the static content from the dynamic content.  Of course, they will, but I'm not sure bending over backwards to accomodate them is the winning move.

Many languages provide library methods to handle this.

Good, now we're back to indentation.  All things being equal, it is better to do things in libraries than in the language; it is cheaper, more flexible, faster to market, less risky, and can support a broader range of preferences (you can have different libraries for different preferences.)  So I like this direction.

E.g.
trimIndent() could be provided to remove leading spaces of every line,
but this would kill the HTML indents at all. Another possibility is to
provide a method like trimMargin() on Kotlin [1] which trims all
spaces before a special character (pipe by default) including a
special character itself.

Now that we're in library world, we can have _all_ of these.  We can trim indents to the first indent, or trim a specified number of spaces off, or trim to a user-selected marker.  And if the users don't like the ones we include, they can write their own.

This is almost nice. Even without syntax highlighting you can easily
distinguish between Java code and injected HTML code, you can indent
Java and HTML independently and HTML code does not clash with Java
code structure.

Pushing this to a library gives users the option, but not the obligation to do this.  That's good.

The only problem is the necesity to call the
trimMargin() method.

For some meaning of "only" :)   Like most syntactic conventions, some users will say "this is great" and others will say "yuck".  I prefer the semantic transparency of calling a method that has a clear specification -- especially when there are multiple possible options.

Remember that we're already in a corner case with respect to indentation -- in many cases, the users don't care at all about the extra spaces, they're just building up a SQL query that is going to be sent to a database, and the database doesn't care either.

This means that original line is preserved in the
bytecode and during runtime and the trimming is processed every time
the method is called causing performance and memory handicap. This
problem could be minimized making trimMargin() a javac intrinsic.

There are multiple layers at which this can be optimized (the JIT may be able to observe that this a pure function applied to a constant), but indeed, this is a great candidate for compile-time constant folding.  (You can even see experiments related to compile-time constant folding going on in the condy-folding branch of the amber repo.)  Note too that we're now in corner-case-of-corner-case territory -- those who care about the indentation and the cost of runtime string processing.

Hoever even in this case it would be hard to enforce usage of this
method and I expect that tons of hard-to-read Java code will appear in
the wild, despite I believe that Java is about readability.

Developers ability to combine simple features to produce unreadable code far outstrips the ability of language designers to do anything about it ...

So I propose to enforce such (or similar) format on language level
instead of adding a library method like "trimMargin()".

I think this would be a language design mistake.  This is taking one arbitrary convention and burning it into the language.  That convention might be fine for some situations, but terrible for others; not only might it not be the most readable choice in all cases, but it could be an actual conflict -- what if the | character is meaningful in the embedded language, such as Markdown tables? Now we're back to escaping -- which we were trying to avoid.

The language shouldn't pick favorites here; it should provide a simple, clear mechanism, which can be usefully composed with other mechanisms to get the job done.  Polluting the language to avoid the method call is a bad trade.

I see some advantages with such syntax:
1. You can comment (or comment out!) a part of multiline string
without terminating it

Rather than framing this as a property of a proposed solution, let's frame it as a question.  What should be the interaction with comments in a raw string?  Should you be able to embed comments? Should you be able to comment lines out?  (Note that many languages support comments, so it may be possible to do this by embedding a comment, rather than using the Java-level commenting.)  While I can surely see the utility of interaction with commenting, I also think that these "requirements" are only in play when the string in question is too long in the first place.

2. Looking into code fragment out of context (e.g. diff log) you
understand that you are inside a multiline literal.
reviewing a diff like

             | x++;
+           | if (x == 10) break;
             | foo(x);

Without pipes you could think that it's Java code without any further
consideration.

This is true, but this is also true of large block comments; you can't tell whether the added line is part of a commented out block or of executable code.

Again, with raw strings, this is more of a problem when used with too-long blocks.

So, there are two things I don't like about this proposal: it's too "opinionated", and at the same time, it loses the fundamental goal we were trying to get to -- not having to muck up an embedded block with escaping.  (Sure, IDEs could (and should) help on pasting here, but that only helps writing, not reading.)

The only disadvantage I see in forcing a pipe prefix is inability to
just paste a big snippet from somewhere to the middle of Java program
in a plain text editor.

As mentioned, we think this is most of the point, so this is a pretty big disadvantage indeed.

Reply via email to