This is an automated email from the ASF dual-hosted git repository. zky pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/incubator-devlake-website.git
commit b6d6652aef20d5c596b19e284320442a543648cd Author: linyh <[email protected]> AuthorDate: Thu Jun 23 02:02:50 2022 +0800 english version --- docs/09-DeveloperDoc/e2e-test-writing-guide-cn.md | 11 +- docs/09-DeveloperDoc/e2e-test-writing-guide-zn.md | 186 ++++++++++++++++++++++ 2 files changed, 191 insertions(+), 6 deletions(-) diff --git a/docs/09-DeveloperDoc/e2e-test-writing-guide-cn.md b/docs/09-DeveloperDoc/e2e-test-writing-guide-cn.md index a8f2a837..4302f741 100644 --- a/docs/09-DeveloperDoc/e2e-test-writing-guide-cn.md +++ b/docs/09-DeveloperDoc/e2e-test-writing-guide-cn.md @@ -1,9 +1,8 @@ -# How to Add Some E2E Test for Plugin? # 如何为插件编写E2E测试 ## 为什么要写 E2E 测试 -E2E 测试,作为自动化测试的一环,一般是指白盒的集成测试,或者允许使用一些数据库等外部服务的单元测试。书写E2E测试的目的是屏蔽一些内部实现逻辑,仅从数据正确性的角度来看同样的外部输入,是否可以得到同样的输出。另外,相较于黑盒的集成测试来说,可以避免一些网络偶然带来的问题。更多关于插件的介绍,可以在这里获取更多信息: 为什么要编写 E2E 测试(未完成) +E2E 测试,作为自动化测试的一环,一般是指白盒的集成测试,或者允许使用一些数据库等外部服务的单元测试。书写E2E测试的目的是屏蔽一些内部实现逻辑,仅从数据正确性的角度来看同样的外部输入,是否可以得到同样的输出。另外,相较于黑盒的集成测试来说,可以避免一些网络等因素带来的偶然问题。更多关于插件的介绍,可以在这里获取更多信息: 为什么要编写 E2E 测试(未完成) 在 DevLake 中,E2E 测试包含接口测试和插件 Extract/Convert 子任务的输入输出结果验证,本篇仅介绍后者的编写流程。 ## 准备数据 @@ -12,7 +11,7 @@ E2E 测试,作为自动化测试的一环,一般是指白盒的集成测试  接下来我们将进行次插件的 E2E 测试的编写。 -编写插件的第一步,就是运行一下对应插件的 Collect 任务,完成数据的收集,也就是让数据库的`_raw_feishu_`开头的表中,保存有对应的数据。 +编写测试的第一步,就是运行一下对应插件的 Collect 任务,完成数据的收集,也就是让数据库的`_raw_feishu_`开头的表中,保存有对应的数据。 以下是采用 DirectRun (cmd) 运行方式的运行日志和数据库结果。 ``` $ go run plugins/feishu/main.go --numOfDaysToCollect 2 --connectionId 1 (注意:随着版本的升级,命令可能产生变化) @@ -49,9 +48,9 @@ press `c` to send cancel signal 这种方案是最简单的,无论使用Postgres或者MySQL,都不会出现什么问题。  -csv导出的成功标准就是go程序可以无误的读取,因为有以下几点值得注意: +csv导出的成功标准就是go程序可以无误的读取,因此有以下几点值得注意: 1. csv文件中的值,可以用双引号包裹,避免值中的逗号等特殊符号破坏了csv格式 -2. csv文件中双引号转义,一般是""代表一个双引号 +2. csv文件中双引号转义,一般是`""`代表一个双引号 3. 注意观察data是否是真实值,而不是base64后的值 导出后,将.csv文件放到`plugins/feishu/e2e/raw_tables/_raw_feishu_meeting_top_user_item.csv`。 @@ -62,7 +61,7 @@ csv导出的成功标准就是go程序可以无误的读取,因为有以下几  关闭后,使用`select ... into outfile`导出csv文件,导出结果大致如下图:  -可以注意到,data字段多了hexsha字段,,而是十六进制的数据,需要人工将其转化为字面量。 +可以注意到,data字段多了hexsha字段,需要人工将其转化为字面量。 ### Vscode Database diff --git a/docs/09-DeveloperDoc/e2e-test-writing-guide-zn.md b/docs/09-DeveloperDoc/e2e-test-writing-guide-zn.md new file mode 100644 index 00000000..df83854b --- /dev/null +++ b/docs/09-DeveloperDoc/e2e-test-writing-guide-zn.md @@ -0,0 +1,186 @@ +# How to write E2E tests for plugins + +## Why write E2E tests + +E2E testing, as a part of automated testing, generally refers to white-box integration testing, or unit testing that allows the use of some external services such as databases. The purpose of writing E2E tests is to shield some internal implementation logic and see whether the same external input can output the same result in terms of data aspect. In addition, compared to the black-box integration tests, it can avoid some chance problems caused by network and other factors. More informat [...] +In DevLake, E2E testing consists of interface testing and input/output result validation for the plugin Extract/Convert subtask, this article only describes the process of writing the latter. + +## Preparing data + +Let's take a simple plugin - Flybook Meeting Hours Collection as an example here, its directory structure looks like this. +! [image](https://user-images.githubusercontent.com/3294100/175061114-53404aac-16ca-45d1-a0ab-3f61d84922ca.png) +Next we will write the E2E tests of the subplugin. + +The first step in writing the E2E test is to run the Collect task of the corresponding plugin to complete the data collection, that is, to have the corresponding data saved in the table starting with `_raw_feishu_` in the database. +Here are the logs and database tables using the DirectRun (cmd) run method. +``` +$ go run plugins/feishu/main.go --numOfDaysToCollect 2 --connectionId 1 (Note: command may change with version upgrade) +[2022-06-22 23:03:29] INFO failed to create dir logs: mkdir logs: file exists +press `c` to send cancel signal +[2022-06-22 23:03:29] INFO [feishu] start plugin +[2022-06-22 23:03:33] INFO [feishu] scheduler for api https://open.feishu.cn/open-apis/vc/v1 worker: 13, request: 10000, duration: 1h0m0s +[2022-06-22 23:03:33] INFO [feishu] total step: 2 +[2022-06-22 23:03:33] INFO [feishu] executing subtask collectMeetingTopUserItem +[2022-06-22 23:03:33] INFO [feishu] [collectMeetingTopUserItem] start api collection +[2022-06-22 23:03:34] INFO [feishu] [collectMeetingTopUserItem] finished records: 1 +[2022-06-22 23:03:34] INFO [feishu] [collectMeetingTopUserItem] end api collection error: %!w(<nil>) +[2022-06-22 23:03:34] INFO [feishu] finished step: 1 / 2 +[2022-06-22 23:03:34] INFO [feishu] executing subtask extractMeetingTopUserItem +[2022-06-22 23:03:34] INFO [feishu] [extractMeetingTopUserItem] get data from _raw_feishu_meeting_top_user_item where params={"connectionId":1} and got 148 +[2022-06-22 23:03:34] INFO [feishu] [extractMeetingTopUserItem] finished records: 1 +[2022-06-22 23:03:34] INFO [feishu] finished step: 2 / 2 +``` + +<img width="993" alt="image" src="https://user-images.githubusercontent.com/3294100/175064505-bc2f98d6-3f2e-4ccf-be68-a1cab1e46401.png"> +Ok, the data has now been saved to the `_raw_feishu_*` table, and the `data` column is the return information from the plugin running. Here we only collected data for the last 2 days, the data information is not much, but also covers a variety of situations, that is, the same person has data on different days. + +It is also worth mentioning that the plugin runs two tasks, `collectMeetingTopUserItem` and `extractMeetingTopUserItem`, the former is the task of collecting data, which is needed to run this time, and the latter is the task of extracting data. It doesn't matter whether it runs in the prepare data session or not. + +Next, we need to export the data to .csv format, this step is a variety of options, you can show your skills, I only introduce a few common methods here. + +### DevLake Code Generator Export + +This program is not yet completed + +### GoLand Database export + + + +This solution is the easiest to use and will not cause any problems whether using in Postgres or MySQL. + +The success criteria for csv export is that the go program can read it without errors, so there are several points worth noticing. +1. the values in the csv file should be wrapped in double quotes to avoid special symbols such as commas in the values that break the csv format +2. double quotes in csv files are escaped, generally `""` represents a double quote +3. pay attention to whether the colume `data` is the real value, not the value after base64 or hex + +After exporting, move the .csv file to `plugins/feishu/e2e/raw_tables/_raw_feishu_meeting_top_user_item.csv`. + +### MySQL Select Into Outfile + +This is MySQL's solution for exporting query results to a file. The MySQL currently started in docker-compose.yml comes with the --security parameter, so it does not allow `select ... into outfile`, you first need to turn off the security parameter, which is done roughly as follows. + +After closing it, use `select ... into outfile` to export the csv file, the export result is roughly as follows. + +You can notice that the data field has extra hexsha fields, which need to be manually converted to literal quantities. + +### Vscode Database + +This is Vscode's solution for exporting query results to a file, but it is not easy to use. Here is the export result without any configuration changes + +However, it is obvious that the escape symbol does not conform to the csv specification, and the data is not successfully exported. After adjusting the configuration and manually replacing `\"` with `""`, we get the following result. + +The data field of this file is encoded in base64, so it needs to be decoded manually to a literal amount. After successful decode you can use + +### MySQL workerbench + +This tool must write the SQL yourself to complete the data export, which can be rewritten by imitating the following SQL. +```sql +SELECT id, params, CAST(`data` as char) as data, url, input,created_at FROM _raw_feishu_meeting_top_user_item; +``` + +Select csv as the save format, and export it for use. + +### Postgres Copy with csv header; + +`Copy(SQL statement) to '/var/lib/postgresql/data/raw.csv' with csv header;` is a common export method for PG to export csv, which can also be used here. +```sql +COPY ( +SELECT id, params, convert_from(data, 'utf-8') as data, url, input,created_at FROM _raw_feishu_meeting_top_user_item +) to '/var/lib/postgresql/data/raw.csv' with csv header; +``` +Use the above statement to complete the export of the file. If your pg is running in docker, then you also need to use the `docker cp` command to export the file to the host for use. + +## Writing E2E tests + +First you need to create a test environment, for example here `meeting_test.go` is created +! [image](https://user-images.githubusercontent.com/3294100/175091380-424974b9-15f3-457b-af5c-03d3b5d17e73.png) +Then enter the test preparation code in it as follows. The code is to create an instance of the `feishu` plugin, and then call `ImportCsvIntoRawTable` to import the data from the csv file into the `_raw_feishu_meeting_top_user_item` table. +```go +func TestMeetingDataFlow(t *testing.T) { + var plugin impl.Feishu + dataflowTester := e2ehelper.NewDataFlowTester(t, "feishu", plugin) + + // import raw data table + dataflowTester.ImportCsvIntoRawTable("./raw_tables/_raw_feishu_meeting_top_user_item.csv", "_raw_feishu_meeting_top_user_item") +} +``` +The signature of the import function is as follows. +```func (t *DataFlowTester) ImportCsvIntoRawTable(csvRelPath string, rawTableName string)``` +He has a twin, with only slight differences in parameters. +```func (t *DataFlowTester) ImportCsvIntoTabler(csvRelPath string, dst schema.Tabler)``` +The former is used to import tables in the raw layer, the latter is used to import arbitrary tables. +**Note:** Also these two functions will delete the db table and use `gorm.AutoMigrate` to re-create a new table for the purpose of clearing data in it. +After importing the data is complete, you can try to run it. It must be PASS without any test logic at this moment. Then proceed to write the logic for calling the call to the extractor task in `TestMeetingDataFlow`. +```go +func TestMeetingDataFlow(t *testing.T) { + var plugin impl.Feishu + dataflowTester := e2ehelper.NewDataFlowTester(t, "feishu", plugin) + + taskData := &tasks.FeishuTaskData{ + Options: &tasks.FeishuOptions{ + ConnectionId: 1, + }, + } + + // import raw data table + dataflowTester.ImportCsvIntoRawTable("./raw_tables/_raw_feishu_meeting_top_user_item.csv", "_raw_feishu_meeting_top_user_item") + + // verify extraction + dataflowTester.FlushTabler(&models.FeishuMeetingTopUserItem{}) + dataflowTester.Subtask(tasks.ExtractMeetingTopUserItemMeta, taskData) + +} +``` +The added code includes a call to `dataflowTester.FlushTabler` to clear the meeting table and a call to `dataflowTester.Subtask` to simulate the running of the subtask `ExtractMeetingTopUserItemMeta`. + +Now run it and see if the subtask `ExtractMeetingTopUserItemMeta` completes without errors. The data results of the `extract` run generally come from the raw table, so the plugin subtask will run correctly if it is written without errors, and you can observe if the data is successfully parsed in the db table in the tool layer, in this case the `_tool_feishu_meeting_top_user_items` table has the correct data. + +If the run is incorrect, then you need to troubleshoot the problem with the plugin itself before moving on to the next step. + +## Verify that the results of the task are correct + +Let's continue writing the test and add the following code at the end of the test function +```go +func TestMeetingDataFlow(t *testing.T) { + ...... + + dataflowTester.VerifyTable( + models.FeishuMeetingTopUserItem{}, + "./snapshot_tables/_tool_feishu_meeting_top_user_items.csv", + []string{"connection_id", "start_time", "name"}, + []string{ + "meeting_count", + "meeting_duration", + "user_type", + "_raw_data_params", + "_raw_data_table", + "_raw_data_id", + "_raw_data_remark", + }, + ) +} +``` +Its purpose is to call `dataflowTester.VerifyTable` to complete the validation of the data results. The third parameter is the primary keys of the table, and the fourth parameter is all the fields of the table that need to be verified. The data used for validation exists in `. /snapshot_tables/_tool_feishu_meeting_top_user_items.csv`, but of course, this file does not exist yet. + +To facilitate the generation of the aforementioned file, DevLake has adopted a testing technique called `Snapshot`, which will automatically generate the file based on the results of a run when the `VerifyTable` file is called without the csv exist. + +But note! You need to do two things: 1. check if the file is generated correctly 2. run it again to make sure there are no errors between the generated results and the re-run results. +These two operations are very important and are directly related to the quality of test writing. We should treat the snapshot file in `.csv' format like a code file. + +If there is a problem with this step, there are usually 2 kinds of problems. +1. The validated fields contain fields like create_at runtime or self-incrementing ids, which cannot be repeatedly validated and should be excluded. +2. there is \n or \r\n and other escape mismatch fields in the results of the run, generally when parsing the `httpResponse` error, you can refer to the following program to solve. + 1. modify the field type of the content in the api model to `json. + 2. convert it to string when parsing + 3. so that the `\n` symbol can be kept intact, avoiding the parsing of line breaks by the database or the operating system + +For example, in the `github` plugin, this is how it is handled. + + + +Well, at this point, the E2E writing is done. We have added a total of 3 new files to complete the testing of the meeting length collection task. It's pretty easy. + + +## Run E2E tests for all plugins like CI + +It's very simple, just run `make e2e-plugins`, because DevLake has already solidified it into a script~
