Re: break seen as a C archaism

Kevin Bourrillion Tue, 13 Mar 2018 17:45:01 -0700

Sorry for 5,000 inline replies.

On Tue, Mar 13, 2018 at 1:18 PM Brian Goetz <brian.go...@oracle.com> wrote:

Thanks for the detailed analysis.  I'm glad to see that a larger percentage
> could be converted to expression switch given sufficient effort;
>

Just to be clear: the sample was only from the 19% already identified as
convertible, so in *this* discussion that number is only going down.
However, the important point is that we should separately investigate the
81% to see whether a few simple heuristics make them recognizable as
e-switches too.

> I'm also not surprised to see that a lot had accidental reasons that kept
> them off the happy path.  And, your analysis comports with expectations in
> another way: that some required more significant intervention than others
> to lever them into compliance.  For example, the ones with side-effecting
> activities or logging, while good candidates for "strongly dissuade",
> happen in the real world, and not all codebases have the level of
> discipline or willingness to refactor that yours does.  So I'm not sure I'm
> ready to toss either of the size-7 sub-buckets aside so quick; not everyone
> is as sanguine as "well, go refactor then" as you are.
>

Okay, I want to clarify a few things about this style of research, which is
how we have been evaluating API decisions for Guava and our other libraries
for a long time, but which I skipped really explaining.

First, we are using "existing code could refactor to" as a proxy for "what
might they probably write anew today if they could". It's a conceit, but a
useful one since the corpus of existing code has the benefit of being
visible and analyzable. :-) So, arguments that we make don't necessarily
rest on whether actual users *will actually refactor*. (Also, I mean, if
they aren't willing to refactor *at all*, then they wouldn't be changing to
expression switch at all so they'd be irrelevant to us anyway, but this is
not the real point.)

Second, we believe that the *most* useful (not *only*) way to judge the
value of a feature lies in comparing the *best code* users can write
without that feature to the *best code* they can write with it. Then we
look at two factors: (a) how much better did the feature make this code,
and (b) how commonly do we think this case comes up in real life. Roughly
speaking we multiply (a) and (b) together (for which I have with
questionable taste attempted to coin the phrase "utility times ubiquity")
and that gives about how much we care about making the change. This type of
analysis has driven most decisions about the shape of Guava's API.

It's not the only kind of argument that can be made. One can also add "it
might not make a large amount of 'great' code 'greater', but people who
write mediocre code will be more likely to write decent code!" I just think
that we should put less stock in such arguments than the other kind,
because this sounds like a problem better addressed with static analysis
tools, education, evangelism, and whatnot.

> Which is to say, adjusting the data brings the simple bucket from 86%
> (which seemed low) to 93-95%, which is "most of the time" but not "all of
> the time".  So most of the time, the code will not need whatever escape
> mechanism (break or block), but it will often enough that having some
> escape hatch is needed.
>

We think the ability to stick with procedural switch is that escape hatch
already.

You didn't mention fallthrough, but because that's everyone's favorite
> punching bag here, I'll add: most of the time, fallthrough is not needed in
> expression switches, but having reviewed a number of low-level switches in
> the JDK, it is sometimes desirable even for switches that can be converted
> to expressions.  One of the motivations for refining break in this way is
> so that when fallthrough is needed, the existing idiom just works as
> everyone understands it to.
>

My assumption is that that code can just keep doing what it's already
doing. My claim is that there is only value to changing to expression
switch if we are getting the benefit of how much more simple and
constrained it is.

imho, early signs suggest that the grossness of `break x` is not *nearly*
> justified by the actual observed positive value of supporting
> multi-statement cases in expression switch. Are we open to killing that, or
> would we be if I produced more and clearer evidence?
>
>
> That's one valid interpretation of the data, but not the only.  Whether
> making break behave more like return (takes a value in non-void context,
> doesn't take one in void context) is gross or natural is subjective.
> Here's my subjective interpretation: "great, most of the time, you don't
> have to use the escape hatch, but when you do, it has control flow just
> like the break we know, extended to take a value in non-void contexts, so
> will be fairly familiar."
>

I think that there are features that make sense on their own, and there are
features that *totally make lots of sense* assuming that you have heard the
expert group's passionate explanation of why they make sense. (It reminds
me of a certain Pied Piper focus group near the end of Silicon Valley
season 2, but moving on.) I am concerned that "breaking a value" is of the
second kind.

But setting aside subjective reactions, are there better alternatives?
> Let's review what has been considered already, and why they've been passed
> over:
>
>  - Do nothing; only allow single expressions.  Non-starter.
>

We're just saying the feature seems to be at least 90% as applicable
without it. Roughly. Why is it a non-starter for the other 10% to stick
with the switch they've always had? I'm sure there are good answers to
that, I'm not doubting there are, but I think we should explore them
instead of just declaring something a non-starter by fiat.

    case 1 -> 1;
>     case 2 -> { println("two"); break 2; }
>
> But that is pretty similar to what we have now, just with braces.  If the
> concern is that we're stretching `break` too far, then this is just as
> bad.
>
> Worse, it has two significant additional downsides:
>
> 1.  You can't fall through at all.  (Yes, I know some people think this is
> an upside.)
>

Yes! That! That's what we want. No fall-through.

> But real code does use fallthrough, and this leaves them without any
> alternative; it also widens the asymmetry of expression switch vs statement
> switch.
>

Well, the other thread I started today is me literally *asking* for
asymmetry between this and statement switch. If we stop using the `switch`
keyword, so much the better.

What are the motivating use cases for fall-through in expression switch?
These must be exclusively examples featuring side-effects, right? Or is
there a way for a case to access the result produced by the previous one
and build on it?

  (Combine this with other suggestions that widen the asymmetry between
> pattern and non-pattern switch, and you have four switch constructs.
> Oops.)
>

(Not familiar with that stuff yet.)

> There might be other alternatives, but I don't see a better one, other
> than deprecating switch and designing a whole new mechanism.
>

I'm confused. `switch` has worked the same way for 20+ years; what could
possibly motivate us to deprecate it?

> And, to defend what we've proposed: it's exactly the switch we all know,
> warts and all.  Very little new; very little in the way of asymmetry
> between void/value and pattern/constant.
>

(My response to this is already teed up in the other thread. Basically, it
says that if we don't make expression switch suitably constrained then I
have so far failed to grasp what its value is at all.)

>   The cost is that we have to accept the existing warts, primarily the
> weird block expression (blocks of statements with break not surrounded by
> braces), the weird scoping, and fallthrough.
>
> This choice reminds me of the old Yiddish proverb of the Tree of Sorrows.
> (https://www.inspirationalstories.com/0/69.html).
>

Tangent, but I think that story actually advocates that we stick with
exactly the switch statement we already have today.

> If you've got something better ...
>
>
>
> On 3/13/2018 3:32 PM, Kevin Bourrillion wrote:
>
> On Fri, Mar 9, 2018 at 3:21 PM, Louis Wasserman <lowas...@google.com>
> wrote:
>
> Simplifying: let's call normal cases in a switch simple if they're a
>> single statement or a no-op fallthrough, and let's call a default simple if
>> it's a single statement or it's not there at all.
>>
>> Among switches apparently convertible to expression switches,
>>
>>    - 81% had all simple normal cases and a simple default.
>>    - 5% had all simple normal cases and a nonsimple default.
>>    - 12% had a nonsimple normal case and a simple default.
>>    - 2% had a nonsimple normal case and a nonsimple default.
>>
>> I was surprised it was as high as 19%, so I grabbed a random sample of
> these 45 occurrences from Google's codebase and reviewed them. My goal was
> to find evidence that multi-statement cases in expression switches are
> important and common. Spoiler: I found said evidence underwhelming.
>
> There were 3 that I would call false matches (e.g. two that simply used a
> void `return` instead of `break` after every case without reason).
>
> There were fully 20 out of the remaining 42 that I quickly concluded
> should be refactored regardless of anything else, and where that
> refactoring happens to leave them with only simple cases and simple/no
> default. These refactorings were varied (hoist out code common to all
> non-exception cases; simplify unreachable code; change to `if` if only 1-2
> cases; extract a method (needing only 1-2 parameters) for a case that is
> much bigger than the others; switch from loop to Streams; change `if/else`
> to ?:; move a precondition check to a more appropriate location; and a few
> other varied cleanups).
>
> Next there were 7 examples where the non-simple cases included
> side-effecting code, like setting fields or calling void methods. In Google
> Style I expect that we will probably forbid (or at least strongly dissuade)
> side effects in expression switch. I should probably bring this up
> separately, but I am pretty convinced by now that users should see
> expression switch and procedural switch as two completely different things,
> and by convention should always keep the former purely functional.
>
> Next there were 7 examples where a case was "non-simple" only because it
> was using the "log, then return a null object (or null), instead of
> throwing an exception" anti-pattern. I was surprised this was that popular.
> and another 2 that used the "log-and-also-throw" anti-pattern.
>
> 2 examples had a use-once local variable that saved a *little* bit of
> nesting. I wouldn't normally refactor these, but if expression switch had
> no mechanism for multi-statement cases, I wouldn't think twice about it.
>
> 1 example had cases that looked nearly identical, 3 statements each, that
> could all be hoisted out of the switch, except that the types that differed
> across the three didn't implement a common interface (as they clearly
> should have). Slightly compelling.
>
> 1 example had all simple cases except that one also wanted to check an
> assertion. Okay, slightly compelling.
>
> Finally, the cases that were the most compelling to me: 3 examples had one
> or more large cases, where factoring them out into helper methods would
> imho be ugly because >=3 parameters would be required. If expression switch
> didn't permit multi-statement cases, I would just keep them as procedural
> switches. It's only 3 out of 42.
>
> Summary:
>
> imho, early signs suggest that the grossness of `break x` is not *nearly*
> justified by the actual observed positive value of supporting
> multi-statement cases in expression switch. Are we open to killing that, or
> would we be if I produced more and clearer evidence?
>
>
>
>
>
>>
>> On Fri, Mar 9, 2018 at 2:56 PM Brian Goetz <brian.go...@oracle.com>
>> wrote:
>>
>>> Did you happen to calculate what percentage was _not_ the "default"
>>> case?  I would expect that to be a considerable fraction.
>>>
>>> On 3/9/2018 5:49 PM, Kevin Bourrillion wrote:
>>>
>>> On Fri, Mar 9, 2018 at 1:19 PM, Remi Forax <fo...@univ-mlv.fr> wrote:
>>>
>>> When i asked what we should do instead, the answer is either:
>>>>   1/ we should not allow block of codes in the expression switch but
>>>> only expression
>>>>   2/ that we should use the lambda syntax with return, even if the
>>>> semantics is different from the lambda semantics.
>>>>
>>>> I do not like (1) because i think the expression switch will become
>>>> useless
>>>
>>>
>>> In our (large) codebase, +Louis determined that, among switch statements
>>> that appear translatable to expression switch, 13.8% of them seem to
>>> require at least one multi-statement case.
>>>
>>>
>>>
>
>
> --
> Kevin Bourrillion | Java Librarian | Google, Inc. | kev...@google.com
>
>
>

-- 
Kevin Bourrillion | Java Librarian | Google, Inc. | kev...@google.com

Re: break seen as a C archaism

Reply via email to