[ 
https://issues.apache.org/jira/browse/PHOENIX-5769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17072381#comment-17072381
 ] 

Istvan Toth edited comment on PHOENIX-5769 at 4/1/20, 4:56 AM:
---------------------------------------------------------------

Some more observations based on 
[https://builds.apache.org/job/PreCommit-PHOENIX-Build/3691]:
 * The jenkins build hosts are about thrice as slow running the test suite as a 
modern machine (ryzen 2700x + NVME SSD). Running a single query takes ~13s vs 
4.5s.
 * mvn verify for the phoenix-core subproject with the current settings takes 
about 4.5 hours, so the 5 hour default limit was really too optimistic, 
considering that we have other subprojects, and downloads and Yetus stuff also 
takes ~20 minutes until the mvn verify gets started. 
 * However, the timeouts look super strange. It's as if Jenkins has simply 
disconnected from maven, and didn't get any maven output after 22:20 until 
after 00:10, when it aborted the build. The SplitSystemCatalogTests have 
actually run, as Jenkins processes the output files, as can be seen later in 
the log. 

{noformat}

22:14:01 [INFO] Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
217.56 s - in org.apache.phoenix.tool.ParameterizedPhoenixCanaryToolIT
22:15:10 [WARNING] Tests run: 26, Failures: 0, Errors: 0, Skipped: 4, Time 
elapsed: 322.952 s - in 
org.apache.phoenix.schema.stats.NamespaceDisabledStatsCollectorIT
22:15:40 [WARNING] Tests run: 39, Failures: 0, Errors: 0, Skipped: 6, Time 
elapsed: 328.538 s - in org.apache.phoenix.schema.stats.NonTxStatsCollectorIT
22:15:51 [WARNING] Tests run: 26, Failures: 0, Errors: 0, Skipped: 4, Time 
elapsed: 363.015 s - in 
org.apache.phoenix.schema.stats.NamespaceEnabledStatsCollectorIT
22:20:05 [WARNING] Tests run: 78, Failures: 0, Errors: 0, Skipped: 12, Time 
elapsed: 530.679 s - in org.apache.phoenix.schema.stats.TxStatsCollectorIT
00:10:34 Build timed out (after 400 minutes). Marking the build as aborted.
00:10:35 Build was aborted
00:10:35 Archiving artifacts
00:10:38 [INFO] 
00:10:38 [INFO] Results:
00:10:38 [INFO] 
00:10:38 [WARNING] Tests run: 1079, Failures: 0, Errors: 0, Skipped: 65
00:10:38 [INFO] 
00:10:38 [INFO] 
00:10:38 [INFO] --- maven-failsafe-plugin:2.22.0:integration-test 
(SplitSystemCatalogTests) @ phoenix-core ---
00:10:38 [INFO] 
00:10:38 [INFO] -------------------------------------------------------
00:10:38 [INFO]  T E S T S
00:10:38 [INFO] -------------------------------------------------------

{noformat}

* We also seem to have some interference between miniClusters started by 
different tests, but failsafe helpfully scrubs that information that may help 
us track that down. I've opened PHOENIX-5814 for that problem


 


was (Author: stoty):
Some more observations based on 
[https://builds.apache.org/job/PreCommit-PHOENIX-Build/3691]:
 * The jenkins build hosts are about thrice as slow running the test suite as a 
modern machine (ryzen 2700x + NVME SSD). Running a single query takes ~13s vs 
4.5s.
 * mvn verify for the phoenix-core subproject with the current settings takes 
about 4.5 hours, so the 5 hour default limit was really too optimistic, 
considering that we have other subprojects, and downloads and Yetus stuff also 
takes ~20 minutes until the mvn verify gets started. 
 * However, the timeouts look super strange. It's as if Jenkins has simply 
disconnected from maven, and didn't get any maven output after 22:20 until 
after 00:10, when it aborted the build. The SplitSystemCatalogTests have 
actually run, as Jenkins processes the output files, as can be seen later in 
the log. 

{noformat}

22:14:01 [INFO] Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
217.56 s - in org.apache.phoenix.tool.ParameterizedPhoenixCanaryToolIT
22:15:10 [WARNING] Tests run: 26, Failures: 0, Errors: 0, Skipped: 4, Time 
elapsed: 322.952 s - in 
org.apache.phoenix.schema.stats.NamespaceDisabledStatsCollectorIT
22:15:40 [WARNING] Tests run: 39, Failures: 0, Errors: 0, Skipped: 6, Time 
elapsed: 328.538 s - in org.apache.phoenix.schema.stats.NonTxStatsCollectorIT
22:15:51 [WARNING] Tests run: 26, Failures: 0, Errors: 0, Skipped: 4, Time 
elapsed: 363.015 s - in 
org.apache.phoenix.schema.stats.NamespaceEnabledStatsCollectorIT
22:20:05 [WARNING] Tests run: 78, Failures: 0, Errors: 0, Skipped: 12, Time 
elapsed: 530.679 s - in org.apache.phoenix.schema.stats.TxStatsCollectorIT
00:10:34 Build timed out (after 400 minutes). Marking the build as aborted.
00:10:35 Build was aborted
00:10:35 Archiving artifacts
00:10:38 [INFO] 
00:10:38 [INFO] Results:
00:10:38 [INFO] 
00:10:38 [WARNING] Tests run: 1079, Failures: 0, Errors: 0, Skipped: 65
00:10:38 [INFO] 
00:10:38 [INFO] 
00:10:38 [INFO] --- maven-failsafe-plugin:2.22.0:integration-test 
(SplitSystemCatalogTests) @ phoenix-core ---
00:10:38 [INFO] 
00:10:38 [INFO] -------------------------------------------------------
00:10:38 [INFO]  T E S T S
00:10:38 [INFO] -------------------------------------------------------

{noformat}



 

> Phoenix precommit Flapping HadoopQA Tests in master 
> ----------------------------------------------------
>
>                 Key: PHOENIX-5769
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5769
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Daniel Wong
>            Assignee: Istvan Toth
>            Priority: Major
>         Attachments: PHOENIX-5769.master.v1.patch, 
> PHOENIX-5769.master.v3.patch, consoleFull (1).html, consoleFull (2).html, 
> consoleFull (3).html, consoleFull (4).html, consoleFull (5).html, consoleFull 
> (6).html, consoleFull (7).html, consoleFull (8).html, consoleFull.html
>
>
> I was recently trying to commit changes to Phoenix for multiple issues and 
> were asked to get clean HadoopQA runs.  However, this took a huge effort as I 
> had to resubmit the same patch multiple times in order to get one "clean".  
> Looking at the errors the most common one were 3 "Multiple regions on 
> <hostname,regions>" and 3 for apache infra issues (host shutdown), 1 for 
> org.apache.hadoop.hbase.NotServingRegionException, 1 for 
> SnapshotDoesNotExistException.   See builds 
> [https://builds.apache.org/job/PreCommit-PHOENIX-Build/] here from 3540's to 
> 3560's.  In addition I see multiple builds running simultaneously, limiting 
> tests to running on 1 host should be configurable right?
> In addition I was recommended by [~yanxinyi] that master was less likely to 
> have issues getting a clean run than 4.x.  FYI [~ckulkarni]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to