[
https://issues.apache.org/jira/browse/SPARK-20840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16028720#comment-16028720
]
Hyukjin Kwon edited comment on SPARK-20840 at 5/30/17 3:19 AM:
---------------------------------------------------------------
[~joshrosen] and [~srowen], I gave a shot to follow ^ but failed because
Javadoc errors look not stored in the compile analysis (It looks this way
resembles
https://github.com/sbt/sbt/blob/5e585e50da7da87fb41ea4ed19e374b84a21010b/main/src/main/scala/sbt/Defaults.scala#L1380-L1388)
So, I gave another shot with another similar approach (parsing the logs from
Javadoc manually) rough version -
https://github.com/apache/spark/compare/master...HyukjinKwon:SPARK-20840?expand=1
I tested this after manually introducing few Javadoc errors several times as
below:
{code}
...
... [suspicous errors]
...
[info] Generating
.../spark/target/javaunidoc/org/apache/spark/sql/DataFrameStatFunctions.html...
...
[info] Generating
.../spark/target/spark/target/javaunidoc/org/apache/spark/sql/DataFrameReader.html...
[error]
.../spark/sql/core/target/java/org/apache/spark/sql/DataFrameReader.java:476:
error: unexpected text
[error] * Loads a {@link Dataset[String}] storing JSON objects (<a
href="http://jsonlines.org/">JSON Lines
[error] ^
[info] Generating
.../spark/target/javaunidoc/org/apache/spark/sql/DataFrameStatFunctions.html...
...
... [some more actual errors]
...
[info] Generating
.../spark/target/javaunidoc/org/apache/spark/ui/storage/package-frame.html...
[info] Generating
.../spark/target/javaunidoc/org/apache/spark/ui/storage/package-summary.html...
[info] Generating
.../spark/target/javaunidoc/org/apache/spark/ui/storage/package-tree.html...
...
[info] 4 error
[info] 100 warnings
...
[error] 4 error(s) found while generating Java documentation.
[error]
.../spark/sql/core/target/java/org/apache/spark/sql/DataFrameReader.java:476:
error: unexpected text
[error] * Loads a {@link Dataset[String}] storing JSON objects (<a
href="http://jsonlines.org/">JSON Lines
[error] ^
[error]
.../spark/sql/core/target/java/org/apache/spark/sql/functions.java:2996: error:
self-closing element not allowed
[error] * @see <a
href="http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html"/>
[error] ^
[error]
.../spark/sql/core/target/java/org/apache/spark/sql/functions.java:3006: error:
reference not found
[error] * Convert time string to a Unix timestamp (in seconds) by casting
rules to {@link TimestampType}.
[error]
^
[error]
.../spark/sql/core/target/java/org/apache/spark/sql/functions.java:3271: error:
reference not found
[error] * (Scala-specific) Parses a column containing a JSON string into a
{@link StructType} with the
[error]
^
...
[info] Main Scala API documentation successful.
...
java.lang.RuntimeException: Failed to generate Java documentation from
generated Java codes.
at scala.sys.package$.error(package.scala:27)
at Unidoc$$anonfun$settings$39.apply(SparkBuild.scala:762)
at Unidoc$$anonfun$settings$39.apply(SparkBuild.scala:729)
...
[error] (spark/javaunidoc:doc) Failed to generate Java documentation from
generated Java codes.
[error] Total time: 95 s, completed ...
{code}
This way prints the errors that we probably need to fix as a kind of a report
at the end of Javadoc failure. I tried to not change the existing logs being
printed out.
The approach is basically to parse {{\[error\] # errors}} and find # {{:
error:}} logs in a reversed order from Javadoc logs when the task is failed.
This is basically based on my observations so far -
https://github.com/apache/spark/pull/17389,
https://github.com/apache/spark/pull/16307,
https://github.com/apache/spark/pull/15999 and
https://github.com/apache/spark/pull/16013. So, I think I am not fully sure it
always parses correctly although I guess it will work in most cases.
This is not a clean shot and a hacky workaround. So, I am not sure if this is
acceptable. Probably, another way I could come up with was about a custom
logger, for example, by resembling
https://github.com/playframework/playframework/blob/e80a4b41ed487df5a77e23762fb301703f9aad33/framework/src/sbt-plugin/src/sbt-test/play-sbt-plugin/play-position-mapper/project/Build.scala
What do you think about this? I could, otherwise, simply print out a log to
point out this JIRA.
was (Author: hyukjin.kwon):
[~joshrosen] and [~srowen], I gave a shot to follow ^ but failed because
Javadoc errors look not stored in the compile analysis (It looks this way
resembles
https://github.com/sbt/sbt/blob/5e585e50da7da87fb41ea4ed19e374b84a21010b/main/src/main/scala/sbt/Defaults.scala#L1380-L1388)
So, I gave another shot with another similar approach (parsing the logs from
Javadoc manually) rough version -
https://github.com/apache/spark/compare/master...HyukjinKwon:SPARK-20840?expand=1
I tested this after manually introducing few Javadoc errors several times as
below:
{code}
...
... [suspicous errors]
...
[info] Generating
.../spark/target/javaunidoc/org/apache/spark/sql/DataFrameStatFunctions.html...
...
[info] Generating
.../spark/target/spark/target/javaunidoc/org/apache/spark/sql/DataFrameReader.html...
[error]
.../spark/sql/core/target/java/org/apache/spark/sql/DataFrameReader.java:476:
error: unexpected text
[error] * Loads a {@link Dataset[String}] storing JSON objects (<a
href="http://jsonlines.org/">JSON Lines
[error] ^
[info] Generating
.../spark/target/javaunidoc/org/apache/spark/sql/DataFrameStatFunctions.html...
...
... [some more actual errors]
...
[info] Generating
.../spark/target/javaunidoc/org/apache/spark/ui/storage/package-frame.html...
[info] Generating
.../spark/target/javaunidoc/org/apache/spark/ui/storage/package-summary.html...
[info] Generating
.../spark/target/javaunidoc/org/apache/spark/ui/storage/package-tree.html...
...
[info] 4 error
[info] 100 warnings
...
[error] 4 error(s) found while generating Java documentation.
[error]
.../spark/sql/core/target/java/org/apache/spark/sql/DataFrameReader.java:476:
error: unexpected text
[error] * Loads a {@link Dataset[String}] storing JSON objects (<a
href="http://jsonlines.org/">JSON Lines
[error] ^
[error]
.../spark/sql/core/target/java/org/apache/spark/sql/functions.java:2996: error:
self-closing element not allowed
[error] * @see <a
href="http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html"/>
[error] ^
[error]
.../spark/sql/core/target/java/org/apache/spark/sql/functions.java:3006: error:
reference not found
[error] * Convert time string to a Unix timestamp (in seconds) by casting
rules to {@link TimestampType}.
[error]
^
[error]
.../spark/sql/core/target/java/org/apache/spark/sql/functions.java:3271: error:
reference not found
[error] * (Scala-specific) Parses a column containing a JSON string into a
{@link StructType} with the
[error]
^
...
[info] Main Scala API documentation successful.
...
java.lang.RuntimeException: Failed to generate Java documentation from
generated Java codes.
at scala.sys.package$.error(package.scala:27)
at Unidoc$$anonfun$settings$39.apply(SparkBuild.scala:762)
at Unidoc$$anonfun$settings$39.apply(SparkBuild.scala:729)
...
[error] (spark/javaunidoc:doc) Failed to generate Java documentation from
generated Java codes.
[error] Total time: 95 s, completed ...
{code}
This way prints the errors that we probably need to fix as a kind of a report
at the end of Javadoc failure. I tried to not change the existing logs being
printed out.
The approach is basically to parse {{\[error\] # errors}} and find # {{:
error:}} logs in a reversed order from Javadoc logs when the task is failed.
This is basically based on my observations so far -
https://github.com/apache/spark/pull/17389,
https://github.com/apache/spark/pull/16307,
https://github.com/apache/spark/pull/15999 and
https://github.com/apache/spark/pull/16013. So, I think I am not fully sure it
always parses correctly although I guess it will work in most cases.
This is not a clean shot and a hacky workaround. So, I am not sure if this is
acceptable but I can't come up with a better way for now.
What do you think about this? I could, otherwise, simply print out a log to
point out this JIRA.
> Misleading spurious errors when there are Javadoc (Unidoc) breaks
> -----------------------------------------------------------------
>
> Key: SPARK-20840
> URL: https://issues.apache.org/jira/browse/SPARK-20840
> Project: Spark
> Issue Type: Bug
> Components: Build, Project Infra
> Affects Versions: 2.2.0
> Reporter: Hyukjin Kwon
>
> Currently, when there are Javadoc breaks, this seems printing warnings as
> errors.
> For example, the actual errors were as below in
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77070/consoleFull
> {code}
> [error]
> /home/jenkins/workspace/SparkPullRequestBuilder@2/core/target/java/org/apache/spark/scheduler/HighlyCompressedMapStatus.java:4:
> error: reference not found
> [error] * than both {@link config.SHUFFLE_ACCURATE_BLOCK_THRESHOLD} and
> [error] ^
> [error]
> /home/jenkins/workspace/SparkPullRequestBuilder@2/core/target/java/org/apache/spark/scheduler/HighlyCompressedMapStatus.java:5:
> error: reference not found
> [error] * {@link config.SHUFFLE_ACCURATE_BLOCK_THRESHOLD_BY_TIMES_AVERAGE} *
> averageSize. It stores the
> [error] ^
> {code}
> but it also prints many errors from generated Java codes as below:
> {code}
> [info] Constructing Javadoc information...
> [error]
> /home/jenkins/workspace/SparkPullRequestBuilder@2/core/target/java/org/apache/spark/scheduler/BlacklistTracker.java:117:
> error: ExecutorAllocationClient is not public in org.apache.spark; cannot be
> accessed from outside package
> [error] public BlacklistTracker
> (org.apache.spark.scheduler.LiveListenerBus listenerBus,
> org.apache.spark.SparkConf conf,
> scala.Option<org.apache.spark.ExecutorAllocationClient> allocationClient,
> org.apache.spark.util.Clock clock) { throw new RuntimeException(); }
> [error]
> ^
> [error]
> /home/jenkins/workspace/SparkPullRequestBuilder@2/core/target/java/org/apache/spark/scheduler/BlacklistTracker.java:118:
> error: ExecutorAllocationClient is not public in org.apache.spark; cannot be
> accessed from outside package
> [error] public BlacklistTracker (org.apache.spark.SparkContext sc,
> scala.Option<org.apache.spark.ExecutorAllocationClient> allocationClient) {
> throw new RuntimeException(); }
> [error]
> ^
> [error]
> /home/jenkins/workspace/SparkPullRequestBuilder@2/core/target/java/org/apache/spark/SparkConf.java:133:
> error: ConfigReader is not public in org.apache.spark.internal.config;
> cannot be accessed from outside package
> [error] private org.apache.spark.internal.config.ConfigReader reader () {
> throw new RuntimeException(); }
> [error] ^
> [error]
> /home/jenkins/workspace/SparkPullRequestBuilder@2/core/target/java/org/apache/spark/SparkConf.java:138:
> error: ConfigEntry is not public in org.apache.spark.internal.config; cannot
> be accessed from outside package
> [error] <T extends java.lang.Object> org.apache.spark.SparkConf set
> (org.apache.spark.internal.config.ConfigEntry<T> entry, T value) { throw new
> RuntimeException(); }
> [error]
> ^
> [error]
> /home/jenkins/workspace/SparkPullRequestBuilder@2/core/target/java/org/apache/spark/SparkConf.java:139:
> error: OptionalConfigEntry is not public in
> org.apache.spark.internal.config; cannot be accessed from outside package
> [error] <T extends java.lang.Object> org.apache.spark.SparkConf set
> (org.apache.spark.internal.config.OptionalConfigEntry<T> entry, T value) {
> throw new RuntimeException(); }
> [error]
> ^
> [error]
> /home/jenkins/workspace/SparkPullRequestBuilder@2/core/target/java/org/apache/spark/SparkConf.java:187:
> error: ConfigEntry is not public in org.apache.spark.internal.config; cannot
> be accessed from outside package
> [error] <T extends java.lang.Object> org.apache.spark.SparkConf
> setIfMissing (org.apache.spark.internal.config.ConfigEntry<T> entry, T value)
> { throw new RuntimeException(); }
> [error]
> ^
> [error]
> /home/jenkins/workspace/SparkPullRequestBuilder@2/core/target/java/org/apache/spark/SparkConf.java:188:
> error: OptionalConfigEntry is not public in
> org.apache.spark.internal.config; cannot be accessed from outside package
> [error] <T extends java.lang.Object> org.apache.spark.SparkConf
> setIfMissing (org.apache.spark.internal.config.OptionalConfigEntry<T> entry,
> T value) { throw new RuntimeException(); }
> [error]
> ^
> [error]
> /home/jenkins/workspace/SparkPullRequestBuilder@2/core/target/java/org/apache/spark/SparkConf.java:208:
> error: ConfigEntry is not public in org.apache.spark.internal.config; cannot
> be accessed from outside package
> [error] org.apache.spark.SparkConf remove
> (org.apache.spark.internal.config.ConfigEntry<?> entry) { throw new
> RuntimeException(); }
> [error]
> ...
> {code}
> These errors are actually warnings in a successful build without Javadoc
> breaks as below -
> https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.7/2908/consoleFull
> {code}
> [info] Constructing Javadoc information...
> [warn]
> /home/jenkins/workspace/spark-master-test-sbt-hadoop-2.7/core/target/java/org/apache/spark/scheduler/BlacklistTracker.java:117:
> error: ExecutorAllocationClient is not public in org.apache.spark; cannot be
> accessed from outside package
> [warn] public BlacklistTracker
> (org.apache.spark.scheduler.LiveListenerBus listenerBus,
> org.apache.spark.SparkConf conf,
> scala.Option<org.apache.spark.ExecutorAllocationClient> allocationClient,
> org.apache.spark.util.Clock clock) { throw new RuntimeException(); }
> [warn]
> ^
> [warn]
> /home/jenkins/workspace/spark-master-test-sbt-hadoop-2.7/core/target/java/org/apache/spark/scheduler/BlacklistTracker.java:118:
> error: ExecutorAllocationClient is not public in org.apache.spark; cannot be
> accessed from outside package
> [warn] public BlacklistTracker (org.apache.spark.SparkContext sc,
> scala.Option<org.apache.spark.ExecutorAllocationClient> allocationClient) {
> throw new RuntimeException(); }
> [warn]
> ^
> [warn]
> /home/jenkins/workspace/spark-master-test-sbt-hadoop-2.7/core/target/java/org/apache/spark/SparkConf.java:133:
> error: ConfigReader is not public in org.apache.spark.internal.config;
> cannot be accessed from outside package
> [warn] private org.apache.spark.internal.config.ConfigReader reader () {
> throw new RuntimeException(); }
> [warn] ^
> [warn]
> /home/jenkins/workspace/spark-master-test-sbt-hadoop-2.7/core/target/java/org/apache/spark/SparkConf.java:138:
> error: ConfigEntry is not public in org.apache.spark.internal.config; cannot
> be accessed from outside package
> [warn] <T extends java.lang.Object> org.apache.spark.SparkConf set
> (org.apache.spark.internal.config.ConfigEntry<T> entry, T value) { throw new
> RuntimeException(); }
> [warn]
> ^
> [warn]
> /home/jenkins/workspace/spark-master-test-sbt-hadoop-2.7/core/target/java/org/apache/spark/SparkConf.java:139:
> error: OptionalConfigEntry is not public in
> org.apache.spark.internal.config; cannot be accessed from outside package
> [warn] <T extends java.lang.Object> org.apache.spark.SparkConf set
> (org.apache.spark.internal.config.OptionalConfigEntry<T> entry, T value) {
> throw new RuntimeException(); }
> [warn]
> ^
> [warn]
> /home/jenkins/workspace/spark-master-test-sbt-hadoop-2.7/core/target/java/org/apache/spark/SparkConf.java:187:
> error: ConfigEntry is not public in org.apache.spark.internal.config; cannot
> be accessed from outside package
> [warn] <T extends java.lang.Object> org.apache.spark.SparkConf
> setIfMissing (org.apache.spark.internal.config.ConfigEntry<T> entry, T value)
> { throw new RuntimeException(); }
> [warn]
> ^
> [warn]
> /home/jenkins/workspace/spark-master-test-sbt-hadoop-2.7/core/target/java/org/apache/spark/SparkConf.java:188:
> error: OptionalConfigEntry is not public in
> org.apache.spark.internal.config; cannot be accessed from outside package
> [warn] <T extends java.lang.Object> org.apache.spark.SparkConf
> setIfMissing (org.apache.spark.internal.config.OptionalConfigEntry<T> entry,
> T value) { throw new RuntimeException(); }
> [warn]
> ^
> [warn]
> /home/jenkins/workspace/spark-master-test-sbt-hadoop-2.7/core/target/java/org/apache/spark/SparkConf.java:208:
> error: ConfigEntry is not public in org.apache.spark.internal.config; cannot
> be accessed from outside package
> [warn] org.apache.spark.SparkConf remove
> (org.apache.spark.internal.config.ConfigEntry<?> entry) { throw new
> RuntimeException(); }
> [warn]
> ...
> {code}
> These look warnings not errors in {{javadoc}} but when we introduce a Javadoc
> break but it seems sbt produces other warnings as errors when generating
> javadoc.
> For example, with the Java code, {{A.java}}, below:
> {code}
> /**
> * Hi
> */
> public class A extends B {
> }
> {code}
> if we run {{javadoc}}
> {code}
> javadoc A.java
> {code}
> it produces a warning because it does not find B symbol. It seems still
> generating the documenation fine.
> {code}
> Loading source file A.java...
> Constructing Javadoc information...
> A.java:4: error: cannot find symbol
> public class A extends B {
> ^
> symbol: class B
> Standard Doclet version 1.8.0_45
> Building tree for all the packages and classes...
> Generating ./A.html...
> Generating ./package-frame.html...
> Generating ./package-summary.html...
> Generating ./package-tree.html...
> Generating ./constant-values.html...
> Building index for all the packages and classes...
> Generating ./overview-tree.html...
> Generating ./index-all.html...
> Generating ./deprecated-list.html...
> Building index for all classes...
> Generating ./allclasses-frame.html...
> Generating ./allclasses-noframe.html...
> Generating ./index.html...
> Generating ./help-doc.html...
> 1 warning
> {code}
> However, if we have a javadoc break in comments as below:
> {code}
> /**
> * Hi
> * @see B
> */
> public class A extends B {
> }
> {code}
> this produces an error and warning.
> {code}
> Loading source file A.java...
> Constructing Javadoc information...
> A.java:5: error: cannot find symbol
> public class A extends B {
> ^
> symbol: class B
> Standard Doclet version 1.8.0_45
> Building tree for all the packages and classes...
> Generating ./A.html...
> A.java:3: error: reference not found
> * @see B
> ^
> Generating ./package-frame.html...
> Generating ./package-summary.html...
> Generating ./package-tree.html...
> Generating ./constant-values.html...
> Building index for all the packages and classes...
> Generating ./overview-tree.html...
> Generating ./index-all.html...
> Generating ./deprecated-list.html...
> Building index for all classes...
> Generating ./allclasses-frame.html...
> Generating ./allclasses-noframe.html...
> Generating ./index.html...
> Generating ./help-doc.html...
> 1 error
> 1 warning
> {code}
> It seems {{sbt unidoc}} recognises errors and also warnings as {{\[error\]}}
> when there are breaks (the related context looks described in
> https://github.com/sbt/sbt/issues/875#issuecomment-24315400).
> Given my observations so far, it is generally okay to just fix {{\[info\] #
> errors}} printed at the bottom which are usually produced in generating the
> html {{Building tree for all the packages and classes...}} phase.
> Essentially, this looks a bug in GenJavaDoc which generates Java codes
> wrongly and a bug in SBT that fails to distinguish warnings and errors in
> this case.
> This message via Jenkins actually looks confusing.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]