[ https://issues.apache.org/jira/browse/LUCENE-3975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Muir updated LUCENE-3975: -------------------------------- Description: We improved on our 'random document generation' a lot in LUCENE-3911 In fact these random docs find a lot of real bugs. Also the linedocs driven from real data is also improved in the analyzer tests: it takes substrings of random linedocs and makes 'partial docs'. Really we should refactor this so that LineDocs uses a mix of real, partial-real, and synthetic docs just like the analyzer tests. This would help tests like term dictionary tests which are basically static (even though they are random, the amount of documents is limited). BaseTokenStreamTestCase would simply pull from LineDocs at that point, but other tests would immediately see the benefits. was: We improved on our 'random document generation' a lot in LUCENE-3911 In fact these random docs find a lot of real bugs. Also the linedocs driven from random data is improved in the analyzer tests: it takes substrings of random linedocs and makes 'partial docs'. Really we should refactor this so that LineDocs uses a mix of real, partial-real, and synthetic docs just like the analyzer tests. This would help tests like term dictionary tests which are basically static (even though they are random, the amount of documents is limited). BaseTokenStreamTestCase would simply pull from LineDocs at that point, but other tests would immediately see the benefits. > factor BaseTokenStream random docs generation into LineDocs > ----------------------------------------------------------- > > Key: LUCENE-3975 > URL: https://issues.apache.org/jira/browse/LUCENE-3975 > Project: Lucene - Java > Issue Type: Test > Components: general/test > Reporter: Robert Muir > > We improved on our 'random document generation' a lot in LUCENE-3911 > In fact these random docs find a lot of real bugs. Also the linedocs > driven from real data is also improved in the analyzer tests: it takes > substrings of random linedocs and makes 'partial docs'. > Really we should refactor this so that LineDocs uses a mix of real, > partial-real, and synthetic docs just like the analyzer tests. > This would help tests like term dictionary tests which are basically > static (even though they are random, the amount of documents is limited). > BaseTokenStreamTestCase would simply pull from LineDocs at that point, > but other tests would immediately see the benefits. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org