[gwt-contrib] Re: RR: SOYC Correlation work

Lex Spoon Tue, 14 Oct 2008 12:55:29 -0700

Kathrin and I reviewed this together.

In general, this solves an important use case, so let's move forward
on it aside from minor issues.


The following use cases are the ones we could think of:
  - Seeing what amount of output code (and in fact the actual code if you
  want) corresponds to what input Java code, at several granularities:
  field, method, class, file.  This information is obtained by the
  current patch.

  - Browsing call graphs to see why a particular method (field, type,
  ...) is included or why it is considered reachable before any
  runAsync occurs even though the programmer intended otherwise.  This
  is not yet implemented but would be simple to add a new correlate
  for why a node is kept alive.  Normal Pruner runs would not fill
  this in, probably, for efficiency, but it should be enough to run
  the Pruner once, after all optimization is done, in a mode that
  causes the reasons to be recorded.

  - Helping trace a method all the way through compilation.  This has
  too many open questions to really address right now.

  - As a helper for the runAsync implementation.  I think the general
  structure is fine.  I will give it a try and we can follow up in
  further iterations.

There is one possibly important general concern that you should look
at right now.  SourceInfo is compared by value, but there can be
multiple source infos for the same file, line number, etc. that are
nonetheless not the same object.  For example, several of the
getLiteralFoo methods in JProgram will generate SourceInfos that have
exactly the same fields but are separate objects.

The reason this might matter is that SourceInfo also has some fields
that use SortedSets of SourceInfo.  If you add as ancestors two
SourceInfos that are not reference equal but that have all their
fields the same, then only one will actually be added as an ancestor.
Since individual SourceInfo objects carry correlates, this could
seemingly lead to problems.

One way to fix things would be to add a unique id to all created
source infos.  Another possible solution would be to use a
LinkedHashSet in the implementation, and then sort when the report
would be made.  I'm not sure, it just looked like a possible issue you
might want to think about.


Other general issues can wait.  For example, what does parent/child
mean?  I originally thought that it meant that the child is a later
version of the parent, after some optimization.  However, that isn't
the case on further thought.  There are many places where a node is
considered a child of some other node that it is related to in another
way.  For example, methods are considered children of their
surrounding types.

For runAsync in particular, it might be necessary to have fewer
parent-child links and to do more with explicit correlates.  I'll look
into that as soon as possible.

Finally, finding correlates is unfortunately slow, because the whole
chain of parents and ancestors is chased every time a request is made.
At some point it might help to do one of:

  - Eagerly copy down correlations, instead of waiting for a call to
    getCorrelates.  Possibly even don't keep the parent links with
    this approach.

  - Disallow adding ancestors after some point, like with a builder
    pattern.  This might help keep caches from being invalidated too
    frequently.


smaller but real concerns:

One of the two versions of BuildTypeMap.emakeSourceInfo does not
check "enclosing" for null, but it can be null.

Correlation.equals() and .hashCode() should account for the underlying
AST node if present.  Otherwise, there will be confusion if different
objects have the same associated ident.  It seems like that can
happen, for example due to overloading.


smaller concerns you might or might not want to do
anything about right now:

SourceInfo.compareTo assumes fileName is not null.  Is that reliable,
or would it be better to use compareTwo?  If it needs to not be null
it would be nice to have an assert in the constructor.

SortedSet.getRoots: it looks valuable to use the cache even if there
is a parent and no ancestors.  Otherwise, the compiler can end up
repeatedly chasing the same chain of parents.  So, it looks better to
consistently use and update the cache, except possibly in the case
where a node is itself a root.  To simply all this, it might help to
break out a separate computeRoots() methods that does not cache.

SourceInfo.additionalAncestors could be null in the common case (isn't
it?) that there are no ancestors.  Currently it always holds a
TreeSet, even when it's empty.  Normally I'd hate such a
micro-optimization, but there are likely to be a lot of SourceInfos
floating around.

SourceInfo.hashCode: Adding the individual hashes is fine but it's
better to scramble them up.  Josh Bloch recommends a pattern like the
following:

  int result = 17;
  result = 37 * result + hash1;
  result = 37 * result + hash2;
  // etc
  return result;

  http://books.google.com/books?id=ZZOiqZQIbRMC&pg=PA39

GenerateJavaAST.makeSourceInfo: className is unused

SourceInfo.compareTwo: if a and b are not null, why not directly return
a.compareTo(b)?  There is an if in there, but I think it doesn't actually
change that behavior.

Why does SourceInfo have both fileName field and correlation?  Should
there be a TODO to phase out the field, or what do you think?

--~--~---------~--~----~------------~-------~--~----~
http://groups.google.com/group/Google-Web-Toolkit-Contributors
-~----------~----~----~----~------~----~------~--~---

[gwt-contrib] Re: RR: SOYC Correlation work

Reply via email to