Re: stripIndent() behavior

Éamonn McManus Wed, 11 Apr 2018 12:01:47 -0700

OK, well I think we do agree that at least stripIndent should be changed to
remove the longest exact whitespace prefix common to all lines? So if you
give it one line starting with 8 spaces and another starting with a tab it
will return the string unchanged. The current behaviour just counts
whitespace characters, so in that example it would strip the tab from the
line that has it, and the first of the 8 spaces from the line that has them.


My point about novice users is that not all users are inconvenienced alike.
Users who have consciously configured their editing environment to use tabs
presumably know what tabs are, and should quickly be able to figure out how
to avoid them in RSLs or what method to call to get rid of them. But users
who are new to the language, and perhaps to programming, may very well not
know that there is such a thing as a tab character. Up until now they
didn't have to care because they could unknowingly mix spaces and tabs
without it mattering. But now I feel we are setting a trap for them where
they will stare in frustration at two obviously identical lines that are
somehow behaving differently from each other.

As an experiment I put myself in the position of a novice user, by trying
out a number of popular editors on the Mac in their default configurations.
I found that NetBeans and IntelliJ did the right thing: by default they
never write a tab. But with Eclipse, Emacs, Vim, and TextMate I found it
quite easy to create mixed indentation. I also scanned the source of about
1,000 public open-source projects that are used within Google, and I found
that 38% of them contained at least one file with inconsistent indentation
(spaces before tabs, or some lines indented with spaces and others with
tabs). And those would be projects presumably from relatively experienced
developers.

On Tue, 10 Apr 2018 at 15:27, Brian Goetz <[email protected]> wrote:

> I think this is "throwing" the baby out with the bathwater.  It is
> punishing those who can use tabs responsibly for the sins of those who
> cannot.

> You have to commit three sins before you have a problem:
>    - using tabs at all
>    - using tabs inconsistently across the lines of a single expression;
>    - using tabs after you've already used spaces on a line.

> While I am sure that there are people who do so, it just seems
> unreasonable to me to throw in the presence of tabs because someone,
> somewhere, might commit these three sins together and *be confused by
> the result*.  (So, no, it's not the only option.)

> Note that IDEs can also highlight code that would be inappropriately
> mangled as a result, so people learn not to commit all three of the sins
> listed.

> On 4/10/2018 5:39 PM, Éamonn McManus wrote:
> >>> 3. If the input contains *any* tab characters at all (except any that
> > are
> >>> part of the trailing whitespace), then this method cannot know that it
> >>> isn't jumbling the end result, and maybe it should just throw.
> >> I think there's a middle ground, where it strips any common whitespace
> >> prefix.  So if every line starts with tab-tab-space-space, it can
safely
> >> strip this.
> > I'm afraid that's not true. In practice if you are using tabs at all it
is
> > very easy in many editors to end up with a mix of spaces and tabs. So
you
> > could easy have (with 8-space tabs) one line that has 3 tabs at the
start,
> > and another that has 2 tabs and 8 spaces. For example with Emacs you can
> > get this just by hitting delete after a tab and then hitting space. You
> > would nevertheless want stripIndent to remove the indentation from both
> > lines, since they look identical.
> >
> > The situation is made worse by the fact that there are two common
> > conventions for tab width, 4 or 8.
> >
> > I think the only way to avoid these problems is for stripIndent to
throw if
> > its argument has any tabs, or at least any tabs in leading whitespace,
and
> > provide a separate method `detab` whose argument says what the width of
a
> > tab stop is. (Just to be sure: this method should arrange for tab stops
to
> > be at positions 4N or 8N, where the first column is column 0. So a tab
can
> > expand into 1 to 4 spaces, or 1 to 8 spaces.)
> >
> > Then users operating in tab-free codebases can just write
.stripIndent(),
> > and users in tab-infected codebases can write .detab(4).stripIndent().
> >
> > The alternative is to expose novice users to many hours of exasperation.
> > Tabs are generally invisible, so you can imagine someone trying to
figure
> > out why two lines that look exactly the same ended up treated
differently.
> > Users may not even know there is such a thing as a tab character. (If
> > stripIndent throws, it should have a helpful message that suggests
calling
> > detab(N) and that the value of n should probably be 4 or 8.)
> >
> >> String asciiArtFTW =
> >> `````````
> >>        `  BOO  `
> >>        `````````.trimMarkers("`", "`");
> > I'm not sure I get that. It doesn't correspond to anything I've ever
wanted
> > to do, even in languages that already have multiline strings. At least,
> > could we have an overload that just takes the starting marker, for the
> > overwhelmingly commoner case where you only want to strip at the start?
> >
> > On Tue, 10 Apr 2018 at 13:50, Brian Goetz <[email protected]>
wrote:
> >
> >
> >
> >>> (now stripIndent)
> >>>
> >>> I've accumulated a few questions/comments on this.
> >>>
> >>> 1. When choosing the amount to trim, it ought to ignore blank lines
and
> >>> only-whitespace lines, right?
> >> Seems right.
> >>> 2. Is it really appropriate to automatically remove trailing
whitespace?
> >> I'm not sure about this either.  The reason that RSLs will have "extra"
> >> whitespace that needs to be stripped is that we want to indent the RSL
> >> snippet relative to the Java code (and as you point out, the IDE may do
> >> that automatically for us.)  But if there's trailing whitespace, its
> >> because the user put it there, and who is it hurting?  It might be
> >> significant.
> >>> 3. If the input contains *any* tab characters at all (except any that
> > are
> >>> part of the trailing whitespace), then this method cannot know that it
> >>> isn't jumbling the end result, and maybe it should just throw.
> >> I think there's a middle ground, where it strips any common whitespace
> >> prefix.  So if every line starts with tab-tab-space-space, it can
safely
> >> strip this.
> >>> 5. If we do end up in a world where we have to call this for almost
> > every
> >>> one of our tens of thousands of multi-line RSLs... is it strange that
I
> >>> feel like I would prefer it was static? It seems like it would look a
> > lot
> >>> more normal that way visually. Ugh...
> >> I think this is likely to vary subjectively a lot.  Some people like
> >> that the instance method is mostly out of the way; others like the
> >> up-front shouting of the static method.
> >> The reason we can't have both is then we can't resolve the method
> >> reference String::strip as a Function<String,String>, which seems a
> >> useful thing to do.
> >>> On top of *that*, I have no idea what "right markers" are good for,
nor
> >>> what customizing the marker choice is good for (other than creating
more
> >>> needless variation between different pieces of code).
> >>>
> >> String asciiArtFTW =
> >> `````````
> >>        `  BOO  `
> >>        `````````.trimMarkers("`", "`");

Re: stripIndent() behavior

Reply via email to