Yes, quite a bit there of note:

More hex numbers than decimal. That was a surprise.

The "0o" prefixed octal number literals just went in in the last two days.
I think a gofix rewrite might be in order--it is more explicit and less
vulnerable to mistake than the legacy 0377 style--despite a whole career of
the other from the PDP-8 onward.

128640 If statements, and just 8034 else, a 16:1 ratio. I'd like to
understand this better, but my test is just a lexer and not a parser so I
don't have much context to draw conclusions about this specifically.
However it seems such a huge ratio that switch must be carrying the load.
There are 5024 switch statements and 24903 cases, so that's "4 virtual
elses" per switch in some broad sense.

Type is well used. 2x as much as struct, so even if half are "type X struct
{...}" the other half are not.

The import frequency is not interpretable beyond "every package imports
something" because of the inner list in imports as typically written and
that's available to a parser but not at the lexical level. I could hack it
to look at lines between the parens after import, but that's beyond the
"test the lexer" goal.

The default to switch ratio is high, and there is at most one per, so this
means 60% of switch statements have a default, or by extension, express
if-then-else if-else-if-else logic with a final clauseless else.

Few fallthrough statements and that's natural. There were few cases outside
Duff's Device where C/C++ code is allowed to fall through by default.
[Having just written a lexer, I can share that would I think be much better
than the switch's fallthrough would be a way to say, "now that I'm in this
case, I've changed my mind and want to PROCEED with the case testing
starting with the next case." That would be swell and there is no way to do
it.)

Byte by that name is quite popular compared to its other identity, uint8.
[Opinion: never have been comfortable about this one. There are no living
machines with non 8-bit bytes, so the generality of "byte is the natural
size for a byte" is a stretch here, as would be 1 for the size of a bit.]

Using true is almost 3x false. I wonder if that is natural or if the
default value of zero/false is behind it.

Lots of panics and not many recovers.

Operators are interesting. nearly 2.5x the != than ==. Perhaps "err != nil"
is the story. A lot more < than >, which is curious...presumably from for
loops, but can't tell at the token level. Way more left shifts than right.
Not true in my own code, so interesting. Some disjunctive/conjunctive
dissonance: 3x the && as ||. 7 x the ++ as --. I guess people like to count
up, even when counting down has the advantage of the expensive load being
done once and the test being against zero. 5x += than -=, surprising.
Pretty low incidence of &^ and &^=, not true in my code at all, so I
suppose the BIC "Bit Clear" of PDP-11 is not a meme. That's surprising to
me: a|=b, sets the b bits in a, a&^b, clears the b bits in a. They are a
team, yet | is  50x the usage of &^. Maybe because C did not have it and
C++ copied and Java copied and people have not understood? These should be
peers. /= is not popular, and %= even less so. I use them both but I may be
the only one (there is some code like this in Big from way back.)

External references shows that the Go team writes lots of tests, and that
unsafe is wildly popular.

The most popular character constant is '0' 3x '9' so it's not all ('0' <=
ch && ch <= '9') ... there are some extra '0's in there.

On Wed, Jun 12, 2019 at 6:49 AM Ian Lance Taylor <i...@golang.org> wrote:

> On Wed, Jun 12, 2019 at 6:08 AM Michael Jones <michael.jo...@gmail.com>
> wrote:
> >
> > I've been working on a cascade of projects, each needing the next as a
> part, the most recent being rewriting text.Scanner. It was not a goal, but
> the existing scanner does not do what I need (recognize Go operators,
> number types, and more) and my shim code was nearly as big as the standard
> library scanner itself, so I just sat down an rewrote it cleanly.
> >
> > To test beyond hand-crafted edge cases it seemed good to try it against
> a large body of Go code. I chose the Go 1.13 code base, and because the
> results are interesting on their own beyond my purpose of code testing, I
> thought to share what I've noticed as a Github Gist on the subject of the
> "Go Popularity Contest"—what are the most used types, most referenced
> packages, most and least popular operators, etc. The data are interesting,
> but I'll let it speak for itself. Find it here:
> >
> > https://gist.github.com/MichaelTJones/ca0fd339401ebbe79b9cbb5044afcfe2
>
> Pretty interesting.  Thanks.
>
> I note that "goto" is more common than "select".  That has to be an
> artifact of the code base.
>
> Ian
>


-- 

*Michael T. jonesmichael.jo...@gmail.com <michael.jo...@gmail.com>*

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CALoEmQwG8%2BhNcUDjuONuydVA1LJyVDrKxa8-Mcqzxu68QGb65A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to