[jira] [Updated] (LUCENE-3975) factor BaseTokenStream random docs generation into LineDocs

Robert Muir (Updated) (JIRA) Wed, 11 Apr 2012 22:35:03 -0700

     [ 
https://issues.apache.org/jira/browse/LUCENE-3975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Robert Muir updated LUCENE-3975:
--------------------------------

    Description: 
We improved on our 'random document generation' a lot in LUCENE-3911

In fact these random docs find a lot of real bugs. Also the linedocs
driven from real data is also improved in the analyzer tests: it takes
substrings of random linedocs and makes 'partial docs'.

Really we should refactor this so that LineDocs uses a mix of real,
partial-real, and synthetic docs just like the analyzer tests.

This would help tests like term dictionary tests which are basically
static (even though they are random, the amount of documents is limited).

BaseTokenStreamTestCase would simply pull from LineDocs at that point,
but other tests would immediately see the benefits.


  was:
We improved on our 'random document generation' a lot in LUCENE-3911

In fact these random docs find a lot of real bugs. Also the linedocs
driven from random data is improved in the analyzer tests: it takes
substrings of random linedocs and makes 'partial docs'.

Really we should refactor this so that LineDocs uses a mix of real,
partial-real, and synthetic docs just like the analyzer tests.

This would help tests like term dictionary tests which are basically
static (even though they are random, the amount of documents is limited).

BaseTokenStreamTestCase would simply pull from LineDocs at that point,
but other tests would immediately see the benefits.


    
> factor BaseTokenStream random docs generation into LineDocs
> -----------------------------------------------------------
>
>                 Key: LUCENE-3975
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3975
>             Project: Lucene - Java
>          Issue Type: Test
>          Components: general/test
>            Reporter: Robert Muir
>
> We improved on our 'random document generation' a lot in LUCENE-3911
> In fact these random docs find a lot of real bugs. Also the linedocs
> driven from real data is also improved in the analyzer tests: it takes
> substrings of random linedocs and makes 'partial docs'.
> Really we should refactor this so that LineDocs uses a mix of real,
> partial-real, and synthetic docs just like the analyzer tests.
> This would help tests like term dictionary tests which are basically
> static (even though they are random, the amount of documents is limited).
> BaseTokenStreamTestCase would simply pull from LineDocs at that point,
> but other tests would immediately see the benefits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3975) factor BaseTokenStream random docs generation into LineDocs

Reply via email to