[ https://issues.apache.org/jira/browse/METRON-672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Casey Stella reassigned METRON-672: ----------------------------------- Assignee: Casey Stella > SolrIndexingIntegrationTest fails intermittently > ------------------------------------------------ > > Key: METRON-672 > URL: https://issues.apache.org/jira/browse/METRON-672 > Project: Metron > Issue Type: Bug > Affects Versions: 0.3.0 > Reporter: Justin Leet > Assignee: Casey Stella > > Adapted from a dev list conversation > h4. Initial Error in Travis > Jon noted this in the Travis builds > {code} > `test(org.apache.metron.solr.integration.SolrIndexingIntegrationTest): > Took too long to complete: 150582 > 150000`, more details below: > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: > 166.167 sec <<< FAILURE! > test(org.apache.metron.solr.integration.SolrIndexingIntegrationTest) > Time elapsed: 166.071 sec <<< ERROR! > {code} > h4. Additional Notes > Casey was able to reproduce this locally (but not in his IDE). Couple details > in the dev list excerpt. > Fixing this should ideally include adding more detailed logging to hopefully > avoid these issues in the future. As a note, unfortunately in this case, > Casey notes that logging seems to make this issue rarer. Still, logging to > be able to understand the flow (and tuning logging level as appropriate) > would help resolve issues in the future. > h4. Dev List Excerpt > Per Casey: > {quote} > Ok, so now I'm concerned that this isn't a fluke. Here's an excerpt from > the failing logs on travis for my PR with substantially longer timeouts ( > https://s3.amazonaws.com/archive.travis-ci.org/jobs/194575474/log.txt) > {code} > Running org.apache.metron.solr.integration.SolrIndexingIntegrationTest > 0 vs 10 vs 0 > Processed > target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json > 10 vs 10 vs 6 > Processed > target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json > 10 vs 10 vs 6 > Processed > target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json > 10 vs 10 vs 6 > Processed > target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json > 10 vs 10 vs 6 > Processed > target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json > 10 vs 10 vs 6 > Processed > target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json > 10 vs 10 vs 6 > Processed > target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json > 10 vs 10 vs 6 > Processed > target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json > 10 vs 10 vs 6 > Processed > target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json > 10 vs 10 vs 6 > Processed > target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json > 10 vs 10 vs 6 > Processed > target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json > 10 vs 10 vs 6 > Processed > target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json > 10 vs 10 vs 6 > Processed > target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json > 10 vs 10 vs 6 > Processed > target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json > 10 vs 10 vs 6 > Processed > target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json > 10 vs 10 vs 6 > Processed > target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json > 10 vs 10 vs 6 > Processed > target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json > 10 vs 10 vs 6 > Processed > target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json > 10 vs 10 vs 6 > Processed > target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-1485200689038.json > 10 vs 10 vs 6 > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: > 317.056 sec <<< FAILURE! > test(org.apache.metron.solr.integration.SolrIndexingIntegrationTest) > Time elapsed: 316.949 sec <<< ERROR! > java.lang.RuntimeException: Took too long to complete: 300783 > 300000 > at > org.apache.metron.integration.ComponentRunner.process(ComponentRunner.java:131) > at > org.apache.metron.indexing.integration.IndexingIntegrationTest.test(IndexingIntegrationTest.java:173) > {code} > I'm getting the impression that this isn't the timeout and we have a > mystery on our hands. Each of those lines "10 vs 10 vs 6" happen 15 > seconds apart. That line means that it read 10 entries from kafka, 10 > entries from the indexed data and 6 entries from HDFS. It's that 6 > entries that is the problem. Also of note, this does not seem to > happen to me locally AND it's not consistent on Travis. Given all > that I'd say that it's a problem with the HDFS Writer not getting > flushed, but I verified that it is indeed flushed per message. > One more thing, just for posterity here, it always freezes at 6 records > written to HDFS. That's the reason I thought it was a flushing issue. > One thing that I would caution though is that this is likely a heisenbug. > The more logging I added earlier made it less likely to occur. It seems > more likely to occur on Travis than locally and I made it happen by > repeatedly running mvn install on Metron-solr (after a mvn install of the > whole project). > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)