[ 
https://issues.apache.org/jira/browse/METRON-672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15838948#comment-15838948
 ] 

ASF GitHub Bot commented on METRON-672:
---------------------------------------

Github user justinleet commented on the issue:

    https://github.com/apache/incubator-metron/pull/424
  
    Thanks for taking the effort to dig into this.  Great work.  Other than a 
couple minor comments, I'm very happy with this.


> SolrIndexingIntegrationTest fails intermittently
> ------------------------------------------------
>
>                 Key: METRON-672
>                 URL: https://issues.apache.org/jira/browse/METRON-672
>             Project: Metron
>          Issue Type: Bug
>    Affects Versions: 0.3.0
>            Reporter: Justin Leet
>            Assignee: Casey Stella
>
> Adapted from a dev list conversation
> h4. Initial Error in Travis
> Jon noted this in the Travis builds
> {code}
> `test(org.apache.metron.solr.integration.SolrIndexingIntegrationTest):
> Took too long to complete: 150582 > 150000`, more details below:
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed:
> 166.167 sec <<< FAILURE!
> test(org.apache.metron.solr.integration.SolrIndexingIntegrationTest)
> Time elapsed: 166.071 sec  <<< ERROR!
> {code}
> h4. Additional Notes
> Casey was able to reproduce this locally (but not in his IDE). Couple details 
> in the dev list excerpt.
> Fixing this should ideally include adding more detailed logging to hopefully 
> avoid these issues in the future.  As a note, unfortunately in this case, 
> Casey notes that logging seems to make this issue rarer.  Still, logging to 
> be able to understand the flow (and tuning logging level as appropriate) 
> would help resolve issues in the future.
> h4. Dev List Excerpt
> Per Casey:
> {quote}
> Ok, so now I'm concerned that this isn't a fluke.  Here's an excerpt from
> the failing logs on travis for my PR with substantially longer timeouts (
> https://s3.amazonaws.com/archive.travis-ci.org/jobs/194575474/log.txt)
> {code}
> Running org.apache.metron.solr.integration.SolrIndexingIntegrationTest
> 0 vs 10 vs 0
> Processed 
> target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
> 10 vs 10 vs 6
> Processed 
> target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
> 10 vs 10 vs 6
> Processed 
> target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
> 10 vs 10 vs 6
> Processed 
> target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
> 10 vs 10 vs 6
> Processed 
> target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
> 10 vs 10 vs 6
> Processed 
> target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
> 10 vs 10 vs 6
> Processed 
> target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
> 10 vs 10 vs 6
> Processed 
> target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
> 10 vs 10 vs 6
> Processed 
> target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
> 10 vs 10 vs 6
> Processed 
> target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
> 10 vs 10 vs 6
> Processed 
> target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
> 10 vs 10 vs 6
> Processed 
> target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
> 10 vs 10 vs 6
> Processed 
> target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
> 10 vs 10 vs 6
> Processed 
> target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
> 10 vs 10 vs 6
> Processed 
> target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
> 10 vs 10 vs 6
> Processed 
> target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
> 10 vs 10 vs 6
> Processed 
> target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
> 10 vs 10 vs 6
> Processed 
> target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
> 10 vs 10 vs 6
> Processed 
> target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json
> 10 vs 10 vs 6
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed:
> 317.056 sec <<< FAILURE!
> test(org.apache.metron.solr.integration.SolrIndexingIntegrationTest)
> Time elapsed: 316.949 sec  <<< ERROR!
> java.lang.RuntimeException: Took too long to complete: 300783 > 300000
>         at 
> org.apache.metron.integration.ComponentRunner.process(ComponentRunner.java:131)
>         at 
> org.apache.metron.indexing.integration.IndexingIntegrationTest.test(IndexingIntegrationTest.java:173)
> {code}
> I'm getting the impression that this isn't the timeout and we have a
> mystery on our hands.  Each of those lines "10 vs 10 vs 6" happen 15
> seconds apart.  That line means that it read 10 entries from kafka, 10
> entries from the indexed data and 6 entries from HDFS.  It's that 6
> entries that is the problem.   Also of note, this does not seem to
> happen to me locally AND it's not consistent on Travis.  Given all
> that I'd say that it's a problem with the HDFS Writer not getting
> flushed, but I verified that it is indeed flushed per message.
> One more thing, just for posterity here, it always freezes at 6 records
> written to HDFS.  That's the reason I thought it was a flushing issue.
> One thing that I would caution though is that this is likely a heisenbug.
> The more logging I added earlier made it less likely to occur. It seems
> more likely to occur on Travis than locally and I made it happen by
> repeatedly running mvn install on Metron-solr (after a mvn install of the
> whole project).
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to