Hello Glynn, i was thinking a lot about your and my approach and decided finally to try your approach first with the hope it will be sufficient for any kind of test cases. I still have concerns about the comparison of floating point data regarding precision, i.e: coordinates, region settings, FCELL and DCELL maps, vector attributes ... . More below:
2011/6/3 Glynn Clements <[email protected]>: > > Soeren Gebbert wrote: > >> I was thinking about a similar approach, but the effort to parse the >> modules XML interface description to identify the command line >> arguments to compare the created data was to much effort for me. > > I don't see a need to parse the command; just execute it and see what > files it creates. Ok, i see. >> Besides that, the handling of test description, module dependencies >> and the comparison of multiple/timeseries outputs (r.sim.water) >> bothered me too. I still have no simple (interface) answers to this >> issues (maybe these are no issues??). > > Dependencies aren't really an issue. You build all of GRASS first, > then test. Any modules which are used for generating test maps or > analysing data are assumed to be correct (they will have test cases of > their own; the most that's required is that such modules are marked as > "critical" so that any failure will be presumed to invalidate the > results of all other tests). I assume such critical modules are coded in the framework, not in the test scripts? But this also means that the test scripts must be interpreted and executed line by line by the framework to identify critical modules used for data generation? Example for a synthetic r.series test using r.mapcalc for data generation. r.mapcalc is marked as critical in the framework: {{{ # r.series synthetic average test with r.mapcalc generated data # The r.series result is validated using the result.ref file in this test directory # Generate the data r.mapcalc expression="input1 = 1" r.mapcalc expression="input2 = 2" # Test the average method of r.series r.series input=input1,input2 output=result method=average }}} Here the assumed workflow: The framework will read the test script and analyse it line by line. In case r.mapcalc is marked as critical and the framework finds the keyword "r.mapcalc" in the script, appearing as first word outside of a comment, it checks if the r.mapcalc test(s) already run correctly and stop the r.series test if they not. In case r.mapcalc tests are valid it starts the r.mapcalc commands and checks there return values. If the return values are correct, then the rest of the script is executed. After reaching the end of this script the framework looks for any generated data in the current mapset (raster, raster3d, vector, color, regions, ...) and looks for corresponding validation files in the test directory. In this case it will find the raster maps input1, input2 and result in the current mapset and validation.ref in the test directory. It will use r.out.ascii on result map choosing a low precision (dp=3??) and compares the output with result.ref which was hopefully generated using the same precision. This example should cover many raster and voxel test cases. > I don't normally advocate such approaches, but testing is one of those > areas which (like documentation) is much harder to get people to work > on than e.g. programming, so minimising the effort involved is > important. > > Minimising the learning curve is probably even more important. If you > can get people to start writing tests, they're more likely to put in > the effort to learn the less straightforward aspects as it becomes > necessary. Ok, i will try to summarize this approach: The test framework will be integrated in the source code of grass and will use the make system to execute tests. The make system should be used to: * run single module or library tests * run all module (raster|vector|general|db ...) tests * run all library tests * run all tests (library than modules) * in case of an all-modules-test it should run critical module tests automatically first Two test locations (LL and UTM?) should be generated and added to the grass sources. The test locations provide all kind of needed test data -> raster maps of different type (elevation maps, images, maps of CELL, FCELL and DCELL type, ...), vector maps (point, line, area, mixed with and with out attribute data), voxel data, regions, raster maps with different color tables, reclassified maps and so on ... . The test data is only located in PERMANENT mapset. But the locations should be small enough to fit in svn without performance issues. Each module and library has its own test directory. The test directories contain the test cases, reference text files and data for import (for *.in.* modules). Validation of data is based on the reference text files located in the test directories for each module/library. Files implementing test cases must end with ".sh", reference files must end with ".ref" . The test cases are based on simple shell style text files, so they can be easily implemented and executed on command line by non developers. Comments in the test case files are used as documentation of the test in the test summary. The framework itself should be implemented in Python. It should provide the following functionality: * Parsing and interpretation of test case files * Logging of all executed test cases * Simple but excellent presentation of test results in different formats (text, html, xml?) * Setting up the test location environment and create/remove temporary mapsets for each test case run * Comparison methods for all testable grass datatypes (raster, color, raster3d, vector, db tables, region, ...) with text files ** test of equal data ** test of almost equal data (precision issue of floating point data on different systems) *** ! using *.out.ascii modules with precision flag should work? ** Equal and almost equal key value tests (g.region -g, r.univar, ...) of text files <-- i am not sure how to realize this * Execution of single test cases ** Reading and analyzing the test case ** Identification of critical modules ** Run of single modules logging stdout, stderr and return value ** Analysis of return values -> indicator if the module/test failed *** ! this assumes that commands in the test cases make no use of pipes ** Recognition of all generated data by modules *** Searching grass database for new raster, vector, raster3d, regions, ... in the temporary mapset *** Searching for new generated text or binary files in the test directory ** Recognition of validation data in the test directory ** Comparison of found data with available reference data ** Logging of the validation process ** Removing the temporary mapset and generated data in the test directory * maybe much more ... Here some test case which must be covered: A simple g.region test with validation. A region.ref text file is present in the test directory. It is a file with key-value pairs used to validate the output of g.region -g. g.region_test.sh {{{ # This is the introduction text for the g.region test # this is the description of the first module test run g.region -g > region }}} The framework will recognize the new text file "region" and the reference file "region.ref" in key value format in the test dir and should use an almost equal key value test for validation. The same approach should work for r.univar and similar modules with shell output. Now a simple v.random test. Because the data is generated randomly the coordinates can not be compared. We need to compare the meta information. A file named result.ref is present in key value format. v.random_test.sh {{{ # This is a simple test of v.random # validation is based on meta information v.random output=random_points n=100 v.info -t random_points > result }}} As with g.region tests the framework should recognize the text file key value validation. A simple v.buffer test. The vector point map "points" is located in the PERMANENT mapset of the test location. A file named result.ref is located for validation in the test directory. The file was generated with v.out.ascii dp=3 format=standard. v.buffer_test.sh {{{ # Test the generation of a buffer around points # Buffer of 10m radius v.buffer input=points output=result distance=10 }}} In this case the framework recognize a new vector map and the result.ref text file. It uses v.out.ascii with dp=3 to export the the result vector map in "standard" format and compares it with result.ref. The format "standard" is the default method to compare vector data and cannot be changed in the test case scripts. I think most of the test cases which we need can be covered with this approach. But the test designer must know that the validation data must be of specific type and precision. I hope i was not to redundant in my thoughts and explanations. :) So what are you thinking, Glynn, Anne, Martin and all interested developers? If this approach is ok, i will put it into the wiki. Best regards Soeren > > -- > Glynn Clements <[email protected]> > _______________________________________________ grass-dev mailing list [email protected] http://lists.osgeo.org/mailman/listinfo/grass-dev
