[D] Performance Tests (from old wiki) [texera]

via GitHub Mon, 20 Oct 2025 22:32:00 -0700


GitHub user chenlica created a discussion: Performance Tests (from old wiki)


>From the page https://github.com/apache/texera/wiki/Performance-Tests (may be 
>dangling)

====
Authors: Hailey Pan, Zuozhi Wang

Reviewed by Chen Li

## Goal

Set up benchmarks to test the performance of each operator in Texera. See new 
performance numbers after each pull request into the master.

## Status

As of 9/25/2016: **FINISHED**

## Modules:

Code in module: `edu.ics.uci.texera.perftest`
The packages `dictionarymatcher`, `keywordmatcher`, `fuzzytokenmatcher`, 
`regexmatcher`, and `nlpextractor` contain the performance test code of each 
operator.

The package `runme` contains the main function to start running the performance 
tests.

## Workflow

### Prepare datasets
We are using the MEDLINE dataset. Its description and files are at 
[here](https://drive.google.com/open?id=1qBNezk1UjMojKofWkQzBjTTc_WHvXeWQdyQ2l6nxli8).
 The package `medline` contains the schema of the dataset.

Data files need to be put in the *(texera 
directory)/texera/texera-perftest/sample-data-files* folder. Put one data file 
in this folder; otherwise, it will affect how we display the results later.

The *perftest-files/queries* folder contains a file of sample queries, which is 
used in testing `KeywordMatcher` and `DictionaryMatcher`. The 
*perftest-files/results* folder contains the performance test results.

Write index and run performance tests: In the package `runme`, 
* `WriteIndex.java` writes index.
* `RunTests.java` assumes that an index already exists, and runs the 
performance tests.
* `RunPerftests.java` writes an index first and then runs performance tests.
The index is written into the *(texera directory)/texera/texera-perftest/index* 
folder.

As mentioned earlier, we want to automate the performance test process. So we 
write a Python script and use a cron job to run it automatically everyday. The 
python script `build.py` will pull changes from github, then run performance 
tests if there’s a change in the master branch.

It’s easy to run performance tests in an IDE (for example, Eclipse or 
IntelliJ). We can simply run the java file, and the IDE takes care of the rest. 
However, in a command line environment, it’s much harder to run the program. 
The command to run the program is generated in `build.py`. (Attention: the 
command needs to be changed if Texera's dependencies change.)


### Performance test results

The Java performance test program writes the results into the 
"perftest-files/results" folder.
There’s one csv file for each operator to record the results of each run.

Here’s a sample format of one csv file "keyword-phrase.csv": 

| Date                | Record #     | Min Time | Max Time | Average Time | Std 
   | Average Results | Commit Number |
|---------------------|--------------|----------|----------|--------------|--------|-----------------|---------------|
| 09-09-2016 00:54:18 | abstract_100 | 0.017    | 1.373    | 0.2371       | 
0.4464 | 2.18            |               |


Other operators’ csv files look similar to the format above.

The "Commit Number" column is empty because we choose to let the Python script 
fill in the commit number.  So running the Java program, either via IDE or 
command line, won’t produce a commit number in the result file. The commit 
number is only added by running the Python script. 

### Display result data
We use an open source Java package called "dashbuilder" to display the results. 
DashBuilder will automatically read the results produced by the python script, 
and display the results. We have an internal documentation file to describe how 
to set up the dashbuilder.



GitHub link: https://github.com/apache/texera/discussions/3978

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: 
[email protected]

[D] Performance Tests (from old wiki) [texera]

Reply via email to