This is an automated email from the ASF dual-hosted git repository.
warren pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/incubator-devlake-website.git
The following commit(s) were added to refs/heads/main by this push:
new b4d065c docs: typos and wording
b4d065c is described below
commit b4d065c25758a35250b4f558c368ccf9d8af288f
Author: CamilleTeruel <[email protected]>
AuthorDate: Fri May 20 17:36:09 2022 +0200
docs: typos and wording
A pass over the documentation to fix some typos and improve wording.
---
docs/01-Overview/01-WhatIsDevLake.md | 4 +--
docs/01-Overview/02-Architecture.md | 6 ++--
docs/01-Overview/03-Roadmap.md | 2 +-
docs/02-QuickStart/01-LocalSetup.md | 4 +--
docs/02-QuickStart/03-TemporalSetup.md | 8 ++---
docs/02-QuickStart/04-DeveloperSetup.md | 8 ++---
docs/03-Plugins/dbt.md | 14 ++++----
docs/03-Plugins/feishu.md | 17 +++++----
docs/03-Plugins/gitextractor.md | 9 ++---
docs/03-Plugins/github.md | 27 +++++---------
docs/03-Plugins/gitlab.md | 32 ++++++++---------
docs/03-Plugins/jenkins.md | 9 +++--
docs/03-Plugins/jira.md | 23 ++++++------
docs/03-Plugins/refdiff.md | 22 ++++++------
docs/03-Plugins/tapd.md | 6 ++--
docs/04-UserManuals/GRAFANA.md | 4 +--
docs/04-UserManuals/MIGRATIONS.md | 18 +++++-----
docs/04-UserManuals/NOTIFICATION.md | 10 +++---
.../create-pipeline-in-advanced-mode.md | 10 +++---
docs/04-UserManuals/github-user-guide-v0.10.0.md | 42 +++++++++++-----------
docs/04-UserManuals/recurring-pipeline.md | 18 +++++-----
docs/05-DataModels/01-DevLakeDomainLayerSchema.md | 4 +--
docs/07-Glossary.md | 30 ++++++++--------
23 files changed, 160 insertions(+), 167 deletions(-)
diff --git a/docs/01-Overview/01-WhatIsDevLake.md
b/docs/01-Overview/01-WhatIsDevLake.md
index 8ef123a..8b787ad 100755
--- a/docs/01-Overview/01-WhatIsDevLake.md
+++ b/docs/01-Overview/01-WhatIsDevLake.md
@@ -10,8 +10,8 @@ Apache DevLake is designed for developer teams looking to
make better sense of t
## What can be accomplished with Apache DevLake?
1. Collect DevOps data across the entire Software Development Life Cycle
(SDLC) and connect the siloed data with a standard [data
model](../05-DataModels/01-DevLakeDomainLayerSchema.md).
-2. Provide out-of-the-box engineering [metrics](../06-EngineeringMetrics.md)
to be visualized in a sereis of dashboards.
-3. Allow a flexible [framework](02-Architecture.md) for data collection ad ETL
to support customizable data analysis.
+2. Visualize out-of-the-box engineering [metrics](../06-EngineeringMetrics.md)
on many dashboards.
+3. Create custom analyzes of your DevOps data with a flexible
[framework](02-Architecture.md) for data collection and ETL.
<div align="left">
diff --git a/docs/01-Overview/02-Architecture.md
b/docs/01-Overview/02-Architecture.md
index dc708aa..18d9b9a 100755
--- a/docs/01-Overview/02-Architecture.md
+++ b/docs/01-Overview/02-Architecture.md
@@ -24,7 +24,7 @@ description: >
## Rules
-1. Higher layer calls lower layer, not the other way around
-2. Whenever lower layer neeeds something from higher layer, a interface should
be introduced for decoupling
-3. Components should be initialized in a low to high order during bootstraping
+1. Higher layers call lower layers, not the other way around
+2. Whenever a lower layer needs something from a higher layer, an interface
should be introduced for decoupling
+3. Components should be initialized in a low to high order during bootstrapping
<br/>
diff --git a/docs/01-Overview/03-Roadmap.md b/docs/01-Overview/03-Roadmap.md
index f7677c2..a0f349f 100755
--- a/docs/01-Overview/03-Roadmap.md
+++ b/docs/01-Overview/03-Roadmap.md
@@ -11,7 +11,7 @@ description: >
## Goals
1. Moving to Apache Incubator and making DevLake a graduation-ready project.
-2. Explore and implement 3 typical use scenarios to help certain engineering
teams and developers:
+2. Explore and implement 3 typical use case scenarios to help certain
engineering teams and developers:
- Observation of open-source project contribution and quality
- DORA metrics for the DevOps team
- SDLC workflow monitoring and improvement
diff --git a/docs/02-QuickStart/01-LocalSetup.md
b/docs/02-QuickStart/01-LocalSetup.md
index eb47e59..698cd30 100644
--- a/docs/02-QuickStart/01-LocalSetup.md
+++ b/docs/02-QuickStart/01-LocalSetup.md
@@ -31,13 +31,13 @@ description: >
- `devlake` takes a while to fully boot up. if `config-ui` complaining
about api being unreachable, please wait a few seconds and try refreshing the
page.
2. Create pipelines to trigger data collection in `config-ui`
3. Click *View Dashboards* button in the top left when done, or visit
`localhost:3002` (username: `admin`, password: `admin`).
- - We use [Grafana](https://grafana.com/) as a visualization tool to build
charts for the [data](../05-DataModels/02-DataSupport.md) stored in our
database.
+ - We use [Grafana](https://grafana.com/) as a visualization tool to build
charts for the [data](../05-DataModels/02-DataSupport.md) stored in our
database.
- Using SQL queries, we can add panels to build, save, and edit customized
dashboards.
- All the details on provisioning and customizing a dashboard can be found
in the [Grafana Doc](../04-UserManuals/GRAFANA.md).
4. To synchronize data periodically, users can set up recurring pipelines with
DevLake's [pipeline blueprint](../04-UserManuals/recurring-pipeline.md) for
details.
#### Upgrade to a newer version
-Support for database schema migration was introduced to DevLake in v0.10.0.
From v0.10.0 onwards, users can upgrade their instance smoothly to a newer
version. However, versions prior to v0.10.0 do not support upgrading to a newer
version with a different database schema. We recommend users deploying a new
instance if needed.
+Support for database schema migration was introduced to DevLake in v0.10.0.
From v0.10.0 onwards, users can upgrade their instance smoothly to a newer
version. However, versions prior to v0.10.0 do not support upgrading to a newer
version with a different database schema. We recommend users to deploy a new
instance if needed.
<br/><br/><br/>
diff --git a/docs/02-QuickStart/03-TemporalSetup.md
b/docs/02-QuickStart/03-TemporalSetup.md
index 05b24c9..cb993c0 100644
--- a/docs/02-QuickStart/03-TemporalSetup.md
+++ b/docs/02-QuickStart/03-TemporalSetup.md
@@ -5,16 +5,16 @@ description: >
---
-Normally, DevLake would execute pipelines on local machine (we call it `local
mode`), it is sufficient most of the time.However, when you have too many
pipelines that need to be executed in parallel, it can be problematic, either
limited by the horsepower or throughput of a single machine.
+Normally, DevLake would execute pipelines on a local machine (we call it
`local mode`), it is sufficient most of the time. However, when you have too
many pipelines that need to be executed in parallel, it can be problematic, as
the horsepower and throughput of a single machine is limited.
-`temporal mode` was added to support distributed pipeline execution, you can
fire up arbitrary workers on multiple machines to carry out those pipelines in
parallel without hitting the single machine limitation.
+`temporal mode` was added to support distributed pipeline execution, you can
fire up arbitrary workers on multiple machines to carry out those pipelines in
parallel to overcome the limitations of a single machine.
-But, be careful, many API services like JIRA/GITHUB have request rate limit
mechanism, collect data in parallel against same API service with same identity
would most likely hit the wall.
+But, be careful, many API services like JIRA/GITHUB have a request rate limit
mechanism. Collecting data in parallel against the same API service with the
same identity would most likely hit such limit.
## How it works
1. DevLake Server and Workers connect to the same temporal server by setting
up `TEMPORAL_URL`
-2. DevLake Server sends `pipeline` to temporal server, and one of the Workers
would pick it up and execute
+2. DevLake Server sends a `pipeline` to the temporal server, and one of the
Workers pick it up and execute it
**IMPORTANT: This feature is in early stage of development. Please use with
caution**
diff --git a/docs/02-QuickStart/04-DeveloperSetup.md
b/docs/02-QuickStart/04-DeveloperSetup.md
index 4ebd4a7..db5ecd5 100644
--- a/docs/02-QuickStart/04-DeveloperSetup.md
+++ b/docs/02-QuickStart/04-DeveloperSetup.md
@@ -50,7 +50,7 @@ description: >
docker-compose up -d mysql grafana
```
-7. Run lake and config UI in dev mode in two seperate terminals:
+7. Run lake and config UI in dev mode in two separate terminals:
```sh
# run lake
@@ -61,7 +61,7 @@ description: >
Q: I got an error saying: `libgit2.so.1.3: cannot open share object file:
No such file or directory`
- A: Make sure your program find `libgit2.so.1.3`. `LD_LIBRARY_PATH` can be
assigned like this if your `libgit2.so.1.3` is located at `/usr/local/lib`:
+ A: Make sure your program can find `libgit2.so.1.3`. `LD_LIBRARY_PATH` can
be assigned like this if your `libgit2.so.1.3` is located at `/usr/local/lib`:
```sh
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib
@@ -69,8 +69,8 @@ description: >
8. Visit config UI at `localhost:4000` to configure data connections.
- Navigate to desired plugins pages on the Integrations page
- - You will need to enter the required information for the plugins you
intend to use.
- - Please reference the following for more details on how to configure each
one:
+ - Enter the required information for the plugins you intend to use.
+ - Refer to the following for more details on how to configure each one:
- [Jira](../03-Plugins/jira.md)
- [GitLab](../03-Plugins/gitlab.md)
- [Jenkins](../03-Plugins/jenkins.md)
diff --git a/docs/03-Plugins/dbt.md b/docs/03-Plugins/dbt.md
index ebfd01a..059bf12 100644
--- a/docs/03-Plugins/dbt.md
+++ b/docs/03-Plugins/dbt.md
@@ -19,12 +19,12 @@ dbt does the T in ELT (Extract, Load, Transform) processes
– it doesn’t extr
#### Commands to run or create in your terminal and the dbt project<a
id="user-setup-commands"></a>
1. pip install dbt-mysql
-2. dbt init demoapp (demoapp is project name)
+2. dbt init demoapp (demoapp is project name)
3. create your SQL transformations and data models
## Convert Data By DBT
-please use the Raw JSON API to manually initiate a run using **cURL** or
graphical API tool such as **Postman**. `POST` the following request to the
DevLake API Endpoint.
+Use the Raw JSON API to manually initiate a run using **cURL** or graphical
API tool such as **Postman**. `POST` the following request to the DevLake API
Endpoint.
```json
[
@@ -51,12 +51,14 @@ please use the Raw JSON API to manually initiate a run
using **cURL** or graphic
- `projectTarget`: this is the default target your dbt project will use.
(optional)
- `selectedModels`: a model is a select statement. Models are defined in .sql
files, and typically in your models directory. (required)
And selectedModels accepts one or more arguments. Each argument can be one of:
-1. a package name #runs all models in your project, example: example
-2. a model name # runs a specific model, example: my_fisrt_dbt_model
+1. a package name, runs all models in your project, example: example
+2. a model name, runs a specific model, example: my_fisrt_dbt_model
3. a fully-qualified path to a directory of models.
-- `vars`: dbt provides a mechanism variables to provide data to models for
compilation. (optional)
-example: select * from events where event_type = '{{ var("event_type") }}'
this sql in your model, you need set parameters "vars": "{event_type:
real_value}"
+- `projectVars`: variables to parametrize dbt models. (optional)
+example:
+`select * from events where event_type = '{{ var("event_type") }}'`
+To execute this SQL query in your model, you need set a value for `event_type`.
### Resources:
- Learn more about dbt [in the docs](https://docs.getdbt.com/docs/introduction)
diff --git a/docs/03-Plugins/feishu.md b/docs/03-Plugins/feishu.md
index 7b60d81..f19e4b0 100644
--- a/docs/03-Plugins/feishu.md
+++ b/docs/03-Plugins/feishu.md
@@ -12,23 +12,22 @@ This plugin collects Feishu meeting data through [Feishu
Openapi](https://open.f
## Configuration
-In order to fully use this plugin, you will need to get app_id and app_secret
from feishu administrator(For help on App info, please see [official Feishu
Docs](https://open.feishu.cn/document/ukTMukTMukTM/ukDNz4SO0MjL5QzM/auth-v3/auth/tenant_access_token_internal)),
-then set these two configurations via Dev Lake's `.env`.
+In order to fully use this plugin, you will need to get app_id and app_secret
from a Feishu administrator (for help on App info, please see [official Feishu
Docs](https://open.feishu.cn/document/ukTMukTMukTM/ukDNz4SO0MjL5QzM/auth-v3/auth/tenant_access_token_internal)),
+then set these two parameters via Dev Lake's `.env`.
### By `.env`
The connection aspect of the configuration screen requires the following key
fields to connect to the Feishu API. As Feishu is a single-source data provider
at the moment, the connection name is read-only as there is only one instance
to manage. As we continue our development roadmap we may enable multi-source
connections for Feishu in the future.
+```
FEISHU_APPID=app_id
-
FEISHU_APPSCRECT=app_secret
+```
+## Collect data from Feishu
-## Collect Data From Feishu
+To collect data, select `Advanced Mode` on the `Create Pipeline Run` page and
paste a JSON config like the following:
-In order to collect data, you have to compose a JSON looks like following one,
and send it by selecting `Advanced Mode` on `Create Pipeline Run` page:
-numOfDaysToCollect: The number of days you want to collect
-rateLimitPerSecond: The number of requests to send(Maximum is 8)
```json
[
@@ -44,6 +43,10 @@ rateLimitPerSecond: The number of requests to send(Maximum
is 8)
]
```
+> `numOfDaysToCollect`: The number of days you want to collect
+
+> `rateLimitPerSecond`: The number of requests to send(Maximum is 8)
+
You can also trigger data collection by making a POST request to `/pipelines`.
```
curl --location --request POST 'localhost:8080/pipelines' \
diff --git a/docs/03-Plugins/gitextractor.md b/docs/03-Plugins/gitextractor.md
index 2057b06..ac97fa3 100644
--- a/docs/03-Plugins/gitextractor.md
+++ b/docs/03-Plugins/gitextractor.md
@@ -7,12 +7,13 @@ description: >
# Git Repo Extractor
## Summary
-This plugin extract commits and references from a remote or local git
repository. It then saves the data into the database or csv files.
+This plugin extracts commits and references from a remote or local git
repository. It then saves the data into the database or csv files.
## Steps to make this plugin work
-1. Use the Git repo extractor to retrieve commit-and-branch-related data from
your repo
-2. Use the GitHub plugin to retrieve Github-issue-and-pr-related data from
your repo. NOTE: you can run only one the issue collection stage as described
in the Github Plugin README.
+1. Use the Git repo extractor to retrieve data about commits and branches from
your repository.
+2. Use the GitHub plugin to retrieve data about Github issues and PRs from
your repository.
+NOTE: you can run only one issue collection stage as described in the Github
Plugin README.
3. Use the [RefDiff](./refdiff.md#development) plugin to calculate version
diff, which will be stored in `refs_commits_diffs` table.
## Sample Request
@@ -37,7 +38,7 @@ curl --location --request POST 'localhost:8080/pipelines' \
}
'
```
-- `url`: the location of the git repository. It should start with
`http`/`https` for remote git repository or `/` for a local one.
+- `url`: the location of the git repository. It should start with
`http`/`https` for a remote git repository and with `/` for a local one.
- `repoId`: column `id` of `repos`.
- `proxy`: optional, http proxy, e.g. `http://your-proxy-server.com:1080`.
- `user`: optional, for cloning private repository using HTTP/HTTPS
diff --git a/docs/03-Plugins/github.md b/docs/03-Plugins/github.md
index e7b97f7..8dac21b 100644
--- a/docs/03-Plugins/github.md
+++ b/docs/03-Plugins/github.md
@@ -17,7 +17,7 @@ This plugin gathers data from `GitHub` to display information
to the user in `Gr
## Metrics
-Here are some examples of what we can use `GitHub` data to show:
+Here are some examples metrics using `GitHub` data:
- Avg Requirement Lead Time By Assignee
- Bug Count per 1k Lines of Code
- Commit Count over Time
@@ -30,47 +30,38 @@ Here are some examples of what we can use `GitHub` data to
show:
## Configuration
### Provider (Datasource) Connection
-The connection aspect of the configuration screen requires the following key
fields to connect to the **GitHub API**. As GitHub is a _single-source data
provider_ at the moment, the connection name is read-only as there is only one
instance to manage. As we continue our development roadmap we may enable
_multi-source_ connections for GitHub in the future.
+The connection section of the configuration screen requires the following key
fields to connect to the **GitHub API**.

- **Connection Name** [`READONLY`]
- - ⚠️ Defaults to "**Github**" and may not be changed.
+ - ⚠️ Defaults to "**Github**" and may not be changed. As GitHub is a
_single-source data provider_ at the moment, the connection name is read-only
as there is only one instance to manage. As we advance on our development
roadmap we may enable _multi-source_ connections for GitHub in the future.
- **Endpoint URL** (REST URL, starts with `https://` or `http://`)
- This should be a valid REST API Endpoint eg. `https://api.github.com/`
- ⚠️ URL should end with`/`
- **Auth Token(s)** (Personal Access Token)
- For help on **Creating a personal access token**, please see official
[GitHub Docs on Personal
Tokens](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token)
- Provide at least one token for Authentication.
- - This field accepts a comma-separated list of values for multiple tokens.
The data collection will take longer for GitHub since they have a **rate limit
of 5k requests per hour**. You can accelerate the process by configuring
_multiple_ personal access tokens.
-
-"For API requests using `Basic Authentication` or `OAuth`, you can make up to
[5,000
requests](https://docs.github.com/en/rest/overview/resources-in-the-rest-api#rate-limiting)
per hour."
+ - This field accepts a comma-separated list of values for multiple tokens.
The data collection will take longer for GitHub since they have a **rate limit
of [5,000
requests](https://docs.github.com/en/rest/overview/resources-in-the-rest-api#rate-limiting)
per hour** (15,000 requests/hour if you pay for `GitHub` enterprise). You can
accelerate the process by configuring _multiple_ personal access tokens.
-- https://docs.github.com/en/rest/overview/resources-in-the-rest-api
-
-If you have a need for more api rate limits, you can set many tokens in the
config file, and we will use all of your tokens.
-
-NOTE: You can get 15000 requests/hour/token if you pay for `GitHub` enterprise.
-
-For an overview of the **GitHub REST API**, please see official [GitHub Docs
on REST](https://docs.github.com/en/rest)
-
Click **Save Connection** to update connection settings.
-
+
### Provider (Datasource) Settings
Manage additional settings and options for the GitHub Datasource Provider.
Currently there is only one **optional** setting, *Proxy URL*. If you are
behind a corporate firewall or VPN you may need to utilize a proxy server.
-**GitHub Proxy URL [ `Optional`]**
+- **GitHub Proxy URL [`Optional`]**
Enter a valid proxy server address on your Network, e.g.
`http://your-proxy-server.com:1080`
Click **Save Settings** to update additional settings.
### Regular Expression Configuration
Define regex pattern in .env
-- GITHUB_PR_BODY_CLOSE_PATTERN: Define key word to associate issue in pr body,
please check the example in .env.example
+- GITHUB_PR_BODY_CLOSE_PATTERN: Define key word to associate issue in PR body,
please check the example in .env.example
## Sample Request
-In order to collect data, you have to compose a JSON looks like following one,
and send it by selecting `Advanced Mode` on `Create Pipeline Run` page:
+To collect data, select `Advanced Mode` on the `Create Pipeline Run` page and
paste a JSON config like the following:
+
```json
[
[
diff --git a/docs/03-Plugins/gitlab.md b/docs/03-Plugins/gitlab.md
index 855bd02..21a86d7 100644
--- a/docs/03-Plugins/gitlab.md
+++ b/docs/03-Plugins/gitlab.md
@@ -22,36 +22,36 @@ description: >
## Configuration
### Provider (Datasource) Connection
-The connection aspect of the configuration screen requires the following key
fields to connect to the **GitLab API**. As GitLab is a _single-source data
provider_ at the moment, the connection name is read-only as there is only one
instance to manage. As we continue our development roadmap we may enable
_multi-source_ connections for GitLab in the future.
+The connection section of the configuration screen requires the following key
fields to connect to the **GitLab API**.

- **Connection Name** [`READONLY`]
- - ⚠️ Defaults to "**Gitlab**" and may not be changed.
+ - ⚠️ Defaults to "**GitLab**" and may not be changed. As GitLab is a
_single-source data provider_ at the moment, the connection name is read-only
as there is only one instance to manage. As we advance on our development
roadmap we may enable _multi-source_ connections for GitLab in the future.
- **Endpoint URL** (REST URL, starts with `https://` or `http://`)
- This should be a valid REST API Endpoint eg.
`https://gitlab.example.com/api/v4/`
- ⚠️ URL should end with`/`
- **Personal Access Token** (HTTP Basic Auth)
- - Login to your Gitlab Account and create a **Personal Access Token** to
authenticate with the API using HTTP Basic Authentication.. The token must be
20 characters long. Save the personal access token somewhere safe. After you
leave the page, you no longer have access to the token.
+ - Login to your GitLab Account and create a **Personal Access Token** to
authenticate with the API using HTTP Basic Authentication. The token must be 20
characters long. Save the personal access token somewhere safe. After you leave
the page, you no longer have access to the token.
1. In the top-right corner, select your **avatar**.
- 2. Select **Edit profile**.
+ 2. Click on **Edit profile**.
3. On the left sidebar, select **Access Tokens**.
4. Enter a **name** and optional **expiry date** for the token.
5. Select the desired **scopes**.
- 6. Select **Create personal access token**.
+ 6. Click on **Create personal access token**.
+
+ For help on **Creating a personal access token**, please see official
[GitLab Docs on Personal
Tokens](https://docs.gitlab.com/ee/user/profile/personal_access_tokens.html).
+ For an overview of the **GitLab REST API**, please see official [GitLab
Docs on
REST](https://docs.gitlab.com/ee/development/documentation/restful_api_styleguide.html#restful-api)
-For help on **Creating a personal access token**, please see official [GitLab
Docs on Personal
Tokens](https://docs.gitlab.com/ee/user/profile/personal_access_tokens.html)
-
-For an overview of the **GitLab REST API**, please see official [GitLab Docs
on
REST](https://docs.gitlab.com/ee/development/documentation/restful_api_styleguide.html#restful-api)
-
Click **Save Connection** to update connection settings.
-
+
### Provider (Datasource) Settings
There are no additional settings for the GitLab Datasource Provider at this
time.
-NOTE: `GitLab Project ID` Mappings feature has been deprecated.
-## Gathering Data with Gitlab
+> NOTE: `GitLab Project ID` Mappings feature has been deprecated.
+
+## Gathering Data with GitLab
To collect data, you can make a POST request to `/pipelines`
@@ -73,17 +73,17 @@ curl --location --request POST 'localhost:8080/pipelines' \
## Finding Project Id
-To get the project id for a specific `Gitlab` repository:
-- Visit the repository page on gitlab
+To get the project id for a specific `GitLab` repository:
+- Visit the repository page on GitLab
- Find the project id just below the title

> Use this project id in your requests, to collect data from this project
-## ⚠️ (WIP) Create a Gitlab API Token <a id="gitlab-api-token"></a>
+## ⚠️ (WIP) Create a GitLab API Token <a id="gitlab-api-token"></a>
-1. When logged into `Gitlab` visit
`https://gitlab.com/-/profile/personal_access_tokens`
+1. When logged into `GitLab` visit
`https://gitlab.com/-/profile/personal_access_tokens`
2. Give the token any name, no expiration date and all scopes (excluding write
access)

diff --git a/docs/03-Plugins/jenkins.md b/docs/03-Plugins/jenkins.md
index 6e67e95..26e72a6 100644
--- a/docs/03-Plugins/jenkins.md
+++ b/docs/03-Plugins/jenkins.md
@@ -8,7 +8,7 @@ description: >
## Summary
-This plugin collects Jenkins data through [Remote Access
API](https://www.jenkins.io/doc/book/using/remote-access-api/). It then
computes and visualizes various devops metrics from the Jenkins data.
+This plugin collects Jenkins data through [Remote Access
API](https://www.jenkins.io/doc/book/using/remote-access-api/). It then
computes and visualizes various DevOps metrics from the Jenkins data.

@@ -25,10 +25,10 @@ In order to fully use this plugin, you will need to set
various configurations v
### By `config-ui`
-The connection aspect of the configuration screen requires the following key
fields to connect to the Jenkins API. As Jenkins is a single-source data
provider at the moment, the connection name is read-only as there is only one
instance to manage. As we continue our development roadmap we may enable
multi-source connections for Jenkins in the future.
+The connection section of the configuration screen requires the following key
fields to connect to the Jenkins API.
- Connection Name [READONLY]
- - ⚠️ Defaults to "Jenkins" and may not be changed.
+ - ⚠️ Defaults to "Jenkins" and may not be changed. As Jenkins is a
_single-source data provider_ at the moment, the connection name is read-only
as there is only one instance to manage. As we advance on our development
roadmap we may enable multi-source connections for Jenkins in the future.
- Endpoint URL (REST URL, starts with `https://` or `http://`i, ends with `/`)
- This should be a valid REST API Endpoint eg. `https://ci.jenkins.io/`
- Username (E-mail)
@@ -42,8 +42,7 @@ Click Save Connection to update connection settings.
## Collect Data From Jenkins
-In order to collect data from Jenkins, you have to compose a JSON looks like
following one, and send it via `Triggers` page on `config-ui`:
-
+To collect data, select `Advanced Mode` on the `Create Pipeline Run` page and
paste a JSON config like the following:
```json
[
diff --git a/docs/03-Plugins/jira.md b/docs/03-Plugins/jira.md
index 0ffb1b0..8ac28d6 100644
--- a/docs/03-Plugins/jira.md
+++ b/docs/03-Plugins/jira.md
@@ -35,8 +35,8 @@ For each connection, you will need to set up following items
first:

- Connection Name: This allow you to distinguish different connections.
-- Endpoint URL: The JIRA instance api endpoint, for JIRA Cloud Service, it
would be: `https://<mydomain>.atlassian.net/rest`. devlake officially supports
JIRA Cloud Service on atlassian.net, may or may not work for JIRA Server
Instance.
-- Basic Auth Token: First, generate a **JIRA API TOKEN** for your JIRA account
on JIRA console (see [Generating API token](#generating-api-token)), then, in
`config-ui` click the KEY icon on the right side of the input to generate a
full `HTTP BASIC AUTH` token for you.
+- Endpoint URL: The JIRA instance API endpoint, for JIRA Cloud Service:
`https://<mydomain>.atlassian.net/rest`. DevLake officially supports JIRA Cloud
Service on atlassian.net, but may or may not work for JIRA Server Instance.
+- Basic Auth Token: First, generate a **JIRA API TOKEN** for your JIRA account
on the JIRA console (see [Generating API token](#generating-api-token)), then,
in `config-ui` click the KEY icon on the right side of the input to generate a
full `HTTP BASIC AUTH` token for you.
- Proxy Url: Just use when you want collect through VPN.
### More custom configuration
@@ -58,23 +58,24 @@ If you want to add more custom config, you can click
"settings" to change these
Devlake supports 3 standard types, all metrics are computed based on these
types:
- - `Bug`: Problems found during `test` phase, before they can reach the
production environment.
- - `Incident`: Problems went through `test` phash, got deployed into
production environment.
+ - `Bug`: Problems found during the `test` phase, before they can reach the
production environment.
+ - `Incident`: Problems that went through the `test` phase, got deployed into
production environment.
- `Requirement`: Normally, it would be `Story` on your instance if you
adopted SCRUM.
-You can may map arbitrary **YOUR OWN ISSUE TYPE** to a single **STANDARD ISSUE
TYPE**, normally, one would map `Story` to `Requirement`, but you could map
both `Story` and `Task` to `Requirement` if that was your case. Those
unspecified type would be copied as standard type directly for your
convenience, so you don't need to map your `Bug` to standard `Bug`.
+You can map arbitrary **YOUR OWN ISSUE TYPE** to a single **STANDARD ISSUE
TYPE**. Normally, one would map `Story` to `Requirement`, but you could map
both `Story` and `Task` to `Requirement` if that was your case. Unspecified
types are copied directly for your convenience, so you don't need to map your
`Bug` to standard `Bug`.
Type mapping is critical for some metrics, like **Requirement Count**, make
sure to map your custom type correctly.
### Find Out Custom Field
-Please follow this guide: [How to find Jira the custom field ID in
Jira?](https://github.com/apache/incubator-devlake/wiki/How-to-find-the-custom-field-ID-in-Jira)
+Please follow this guide: [How to find the custom field ID in
Jira?](https://github.com/apache/incubator-devlake/wiki/How-to-find-the-custom-field-ID-in-Jira)
## Collect Data From JIRA
-In order to collect data from JIRA, you have to compose a JSON looks like
following one, and send it via `Triggers` page on `config-ui`.
-<font color="#ED6A45">Warning: Data collection only supports single-task
execution, and the results of concurrent multi-task execution may not meet
expectations.</font>
+To collect data, select `Advanced Mode` on the `Create Pipeline Run` page and
paste a JSON config like the following:
+
+> <font color="#ED6A45">Warning: Data collection only supports single-task
execution, and the results of concurrent multi-task execution may not meet
expectations.</font>
```
[
@@ -92,8 +93,8 @@ In order to collect data from JIRA, you have to compose a
JSON looks like follow
```
- `connectionId`: The `ID` field from **JIRA Integration** page.
-- `boardId`: JIRA board id, see "Find Board Id" for detail.
-- `since`: optional, download data since specified date/time only.
+- `boardId`: JIRA board id, see "Find Board Id" for details.
+- `since`: optional, download data since a specified date only.
### Find Board Id
@@ -115,7 +116,7 @@ Your board id is used in all REST requests to Apache
DevLake. You do not need to
### Data Connections
-1. Get all data connection
+1. Get all data connection
```GET /plugins/jira/connections
[
diff --git a/docs/03-Plugins/refdiff.md b/docs/03-Plugins/refdiff.md
index cdc1c06..b947a21 100644
--- a/docs/03-Plugins/refdiff.md
+++ b/docs/03-Plugins/refdiff.md
@@ -9,7 +9,7 @@ description: >
## Summary
-For development workload analysis, we often need to know how many commits have
been created between 2 releases. This plugin offers the ability to calculate
the commits of difference between 2 Ref(branch/tag), and the result will be
stored back into database for further analysis.
+For development workload analysis, we often need to know how many commits have
been created between 2 releases. This plugin calculates which commits differ
between 2 Ref (branch/tag), and the result will be stored back into database
for further analysis.
## Important Note
@@ -102,18 +102,16 @@ make
make install
```
-Troubleshooting (MacOS)
+#### Troubleshooting (MacOS)
-Q: I got an error saying: `pkg-config: exec: "pkg-config": executable file not
found in $PATH`
+> Q: I got an error saying: `pkg-config: exec: "pkg-config": executable file
not found in $PATH`
-A:
-
-1. Make sure you have pkg-config installed:
-
- `brew install pkg-config`
-
-2. Make sure your pkg config path covers the installation:
-
- `export
PKG_CONFIG_PATH=$PKG_CONFIG_PATH:/usr/local/lib:/usr/local/lib/pkgconfig`
+> A:
+> 1. Make sure you have pkg-config installed:
+>
+> `brew install pkg-config`
+>
+> 2. Make sure your pkg config path covers the installation:
+> `export
PKG_CONFIG_PATH=$PKG_CONFIG_PATH:/usr/local/lib:/usr/local/lib/pkgconfig`
<br/><br/><br/>
diff --git a/docs/03-Plugins/tapd.md b/docs/03-Plugins/tapd.md
index 24b124d..fc93539 100644
--- a/docs/03-Plugins/tapd.md
+++ b/docs/03-Plugins/tapd.md
@@ -1,10 +1,10 @@
-# Feishu
+# TAPD
## Summary
-This plugin collects tapd data.
+This plugin collects TAPD data.
-This plugin is in developing so that cannot modify settings in config-ui.
+This plugin is in development so you can't modify settings in config-ui.
## Configuration
diff --git a/docs/04-UserManuals/GRAFANA.md b/docs/04-UserManuals/GRAFANA.md
index 5837c14..a54c781 100644
--- a/docs/04-UserManuals/GRAFANA.md
+++ b/docs/04-UserManuals/GRAFANA.md
@@ -9,7 +9,7 @@ description: >
<img
src="https://user-images.githubusercontent.com/3789273/128533901-3107e9bf-c3e3-4320-ba47-879fe2b0ea4d.png"
width="450px" />
-When first visiting grafana, you will be provided with a sample dashboard with
some basic charts setup from the database
+When first visiting Grafana, you will be provided with a sample dashboard with
some basic charts setup from the database.
## Contents
@@ -87,7 +87,7 @@ In the top right of the window are buttons for:
## Dashboard Settings<a id="dashboard-settings"></a>
-When viewing a dashboard click on the settings icon to view dashboard
settings. In here there is 2 pages important sections to use:
+When viewing a dashboard click on the settings icon to view dashboard
settings. Here are 2 important sections to use:

diff --git a/docs/04-UserManuals/MIGRATIONS.md
b/docs/04-UserManuals/MIGRATIONS.md
index 4d3f1e2..edab4ca 100644
--- a/docs/04-UserManuals/MIGRATIONS.md
+++ b/docs/04-UserManuals/MIGRATIONS.md
@@ -7,14 +7,14 @@ description: >
# Migrations (Database)
## Summary
-Starting in v0.10.0, DevLake provides a lightweight migration tool for
executing migration scripts.
-Both framework itself and plugins define their migration scripts in their own
migration folder.
+Starting in v0.10.0, DevLake provides a lightweight migration tool for
executing migration scripts.
+Both framework itself and plugins define their migration scripts in their own
migration folder.
The migration scripts are written with gorm in Golang to support different SQL
dialects.
## Migration script
-Migration script describes how to do database migration.
-They implement the `Script` interface.
+Migration script describes how to do database migration.
+They implement the `Script` interface.
When DevLake starts, scripts register themselves to the framework by invoking
the `Register` function
```go
@@ -27,10 +27,10 @@ type Script interface {
## Table `migration_history`
-The table tracks migration scripts execution and schemas changes.
+The table tracks migration scripts execution and schemas changes.
From which, DevLake could figure out the current state of database schemas.
## How it Works
-1. check `migration_history` table, calculate all the migration scripts need
to be executed.
-2. sort scripts by Version in ascending order.
-3. execute scripts.
-4. save results in the `migration_history` table.
+1. Check `migration_history` table, calculate all the migration scripts need
to be executed.
+2. Sort scripts by Version in ascending order.
+3. Execute scripts.
+4. Save results in the `migration_history` table.
diff --git a/docs/04-UserManuals/NOTIFICATION.md
b/docs/04-UserManuals/NOTIFICATION.md
index 8d7b855..d5ebd2b 100644
--- a/docs/04-UserManuals/NOTIFICATION.md
+++ b/docs/04-UserManuals/NOTIFICATION.md
@@ -7,7 +7,7 @@ description: >
# Notification
## Request
-example request
+Example request
```
POST
/lake/notify?nouce=3-FDXxIootApWxEVtz&sign=424c2f6159bd9e9828924a53f9911059433dc14328a031e91f9802f062b495d5
@@ -15,17 +15,17 @@ POST
/lake/notify?nouce=3-FDXxIootApWxEVtz&sign=424c2f6159bd9e9828924a53f9911059
```
## Configuration
-If you want to use the notification feature, you should add two configuration
key to `.env` file.
+If you want to use the notification feature, you should add two configuration
key to `.env` file.
```shell
# .env
-# endpoint is the notification request url, eg: http://example.com/lake/notify
+# notification request url, e.g.: http://example.com/lake/notify
NOTIFICATION_ENDPOINT=
-# screte is used to calculate signature
+# secret is used to calculate signature
NOTIFICATION_SECRET=
```
## Signature
-You should check the signature before accepting the notification request. We
use sha256 algorithm to calculate the checksum.
+You should check the signature before accepting the notification request. We
use sha256 algorithm to calculate the checksum.
```go
// calculate checksum
sum := sha256.Sum256([]byte(requestBody + NOTIFICATION_SECRET + nouce))
diff --git a/docs/04-UserManuals/create-pipeline-in-advanced-mode.md
b/docs/04-UserManuals/create-pipeline-in-advanced-mode.md
index 1f64054..d3ab3c1 100644
--- a/docs/04-UserManuals/create-pipeline-in-advanced-mode.md
+++ b/docs/04-UserManuals/create-pipeline-in-advanced-mode.md
@@ -7,13 +7,13 @@ description: >
## Why advanced mode?
-Advanced mode allows users to create any pipeline by writing JSON. This is
most useful for users who'd like to:
+Advanced mode allows users to create any pipeline by writing JSON. This is
useful for users who want to:
1. Collect multiple GitHub/GitLab repos or Jira projects within a single
pipeline
2. Have fine-grained control over what entities to collect or what subtasks to
run for each plugin
3. Orchestrate a complex pipeline that consists of multiple stages of plugins.
-Advaned mode gives the most flexiblity to users by exposing the JSON API
+Advanced mode gives the most flexibility to users by exposing the JSON API.
## How to use advanced mode to create pipelines?
@@ -31,13 +31,13 @@ Advaned mode gives the most flexiblity to users by exposing
the JSON API
## Examples
-1. Collect multiple GitLab repos sequentially.
+1. Collect multiple GitLab repos sequentially.
->When there're multiple collection tasks against a single data source, we
recommend running these tasks sequentially since the collection speed is mostly
limited by the API rate limit of the data source.
+>When there're multiple collection tasks against a single data source, we
recommend running these tasks sequentially since the collection speed is mostly
limited by the API rate limit of the data source.
>Running multiple tasks against the same data source is unlikely to speed up
>the process and may overwhelm the data source.
-Below is an example for collecting 2 GitLab repos sequentially. It has 2
stages, each contains a GitLab task.
+Below is an example for collecting 2 GitLab repos sequentially. It has 2
stages, each contains a GitLab task.
```
diff --git a/docs/04-UserManuals/github-user-guide-v0.10.0.md
b/docs/04-UserManuals/github-user-guide-v0.10.0.md
index a9dedce..774e426 100644
--- a/docs/04-UserManuals/github-user-guide-v0.10.0.md
+++ b/docs/04-UserManuals/github-user-guide-v0.10.0.md
@@ -6,11 +6,11 @@ description: >
## Summary
-GitHub has a rate limit of 2,000 API calls per hour for their REST API.
+GitHub has a rate limit of 5,000 API calls per hour for their REST API.
As a result, it may take hours to collect commits data from GitHub API for a
repo that has 10,000+ commits.
To accelerate the process, DevLake introduces GitExtractor, a new plugin that
collects git data by cloning the git repo instead of by calling GitHub APIs.
-Starting from v0.10.0, DevLake will collect GitHub data in 2 separate plugins:
+Starting from v0.10.0, DevLake will collect GitHub data in 2 separate plugins:
- GitHub plugin (via GitHub API): collect repos, issues, pull requests
- GitExtractor (via cloning repos): collect commits, refs
@@ -32,45 +32,43 @@ There're 3 steps.
### Step 1 - Configure GitHub connection
-1. Visit `config-ui` at `http://localhost:4000`, click the GitHub icon
+1. Visit `config-ui` at `http://localhost:4000` and click the GitHub icon
2. Click the default connection 'Github' in the list

-
+
3. Configure connection by providing your GitHub API endpoint URL and your
personal access token(s).

- > Endpoint URL: Leave this unchanged if you're using github.com. Otherwise
replace it with your own GitHub instance's REST API endpoint URL. This URL
should end with '/'.
- >
- > Auth Token(s): Fill in your personal access tokens(s). For how to
generate personal access tokens, please see GitHub's [official
documentation](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token).
- > You can provide multiple tokens to speed up the data collection process,
simply concatenating tokens with commas.
- >
- > GitHub Proxy URL: This is optional. Enter a valid proxy server address
on your Network, e.g. http://your-proxy-server.com:1080
-
+- Endpoint URL: Leave this unchanged if you're using github.com. Otherwise
replace it with your own GitHub instance's REST API endpoint URL. This URL
should end with '/'.
+- Auth Token(s): Fill in your personal access tokens(s). For how to generate
personal access tokens, please see GitHub's [official
documentation](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token).
+You can provide multiple tokens to speed up the data collection process,
simply concatenating tokens with commas.
+- GitHub Proxy URL: This is optional. Enter a valid proxy server address on
your Network, e.g. http://your-proxy-server.com:1080
+
4. Click 'Test Connection' and see it's working, then click 'Save Connection'.
5. [Optional] Help DevLake understand your GitHub data by customizing data
enrichment rules shown below.

-
+
1. Pull Request Enrichment Options
-
+
1. `Type`: PRs with label that matches given Regular Expression, their
properties `type` will be set to the value of first sub match. For example,
with Type being set to `type/(.*)$`, a PR with label `type/bug`, its `type`
would be set to `bug`, with label `type/doc`, it would be `doc`.
2. `Component`: Same as above, but for `component` property.
-
+
2. Issue Enrichment Options
-
+
1. `Severity`: Same as above, but for `issue.severity` of course.
-
+
2. `Component`: Same as above.
-
+
3. `Priority`: Same as above.
-
- 4. **Requirement** : Issues with label that matches given Regular
Expression, their properties `type` will be set to `REQUIREMENT`. Unlike
`PR.type`, submatch does nothing, because for Issue Management Analysis,
people tend to focus on 3 kinds of type (Requiremnt/Bug/Incident), however, the
concrete naming varies from repo to repo, time to time, so we decided to
standardize them to help analyst making general purpose metric.
-
+
+ 4. **Requirement** : Issues with label that matches given Regular
Expression, their properties `type` will be set to `REQUIREMENT`. Unlike
`PR.type`, submatch does nothing, because for Issue Management Analysis,
people tend to focus on 3 kinds of types (Requirement/Bug/Incident), however,
the concrete naming varies from repo to repo, time to time, so we decided to
standardize them to help analysts make general purpose metrics.
+
5. **Bug**: Same as above, with `type` setting to `BUG`
-
+
6. **Incident**: Same as above, with `type` setting to `INCIDENT`
-
+
6. Click 'Save Settings'
### Step 2 - Create a pipeline to collect GitHub data
diff --git a/docs/04-UserManuals/recurring-pipeline.md
b/docs/04-UserManuals/recurring-pipeline.md
index 29f3698..bfa85dd 100644
--- a/docs/04-UserManuals/recurring-pipeline.md
+++ b/docs/04-UserManuals/recurring-pipeline.md
@@ -6,24 +6,24 @@ description: >
## How to create recurring pipelines?
-Once you've verified a pipeline works well, mostly likely you'll want to run
that pipeline periodically to keep data fresh, and DevLake's pipeline blueprint
feature have got you covered.
+Once you've verified that a pipeline works, most likely you'll want to run
that pipeline periodically to keep data fresh, and DevLake's pipeline blueprint
feature have got you covered.
-1. Click 'Create Pipeline Run' and
+1. Click 'Create Pipeline Run' and
- Toggle the plugins you'd like to run, here we use GitHub and GitExtractor
plugin as an example
- Toggle on Automate Pipeline

2. Click 'Add Blueprint'. Fill in the form and 'Save Blueprint'.
-
- - **NOTE**: That the schedule syntax is standard unix cron syntax,
[Crontab.guru](https://crontab.guru/) could be a useful reference
- - **IMPORANT**: The scheduler is running under `UTC` timezone. If you
prefer data collecting happens at 3am NewYork(UTC-04:00) every day, use
**Custom Shedule** and set it to `0 7 * * *`
-
+
+ - **NOTE**: The schedule syntax is standard unix cron syntax,
[Crontab.guru](https://crontab.guru/) is an useful reference
+ - **IMPORANT**: The scheduler is running using the `UTC` timezone. If you
want data collection to happen at 3 AM New York time (UTC-04:00) every day, use
**Custom Shedule** and set it to `0 7 * * *`
+

-
+
3. Click 'Save Blueprint'.
-
+
4. Click 'Pipeline Blueprints', you can view and edit the new blueprint in the
blueprint list.
-
+

\ No newline at end of file
diff --git a/docs/05-DataModels/01-DevLakeDomainLayerSchema.md
b/docs/05-DataModels/01-DevLakeDomainLayerSchema.md
index 092dcc8..3402e4b 100644
--- a/docs/05-DataModels/01-DevLakeDomainLayerSchema.md
+++ b/docs/05-DataModels/01-DevLakeDomainLayerSchema.md
@@ -10,14 +10,14 @@ description: >
## Summary
-This document describes the entities and their relationships in DevLake's
domain layer schema.
+This document describes the entities in DevLake's domain layer schema and
their relationships.
Data in the domain layer is transformed from the data in the tool layer. The
tool layer schema is based on the data from specific tools such as Jira,
GitHub, Gitlab, Jenkins, etc. The domain layer schema can be regarded as an
abstraction of tool-layer schemas.
Domain layer schema itself includes 2 logical layers: a `DWD` layer and a
`DWM` layer. The DWD layer stores the detailed data points, while the DWM is
the slight aggregation and operation of DWD to store more organized details or
middle-level metrics.
-## Use Scenario
+## Use Case Scenario
1. Users can make customized Grafana dashboards based on the domain layer
schema.
2. Contributors can understand more about DevLake's data model.
diff --git a/docs/07-Glossary.md b/docs/07-Glossary.md
index 79b01f4..1789a8f 100644
--- a/docs/07-Glossary.md
+++ b/docs/07-Glossary.md
@@ -17,51 +17,51 @@ description: >
The following terms are arranged in the order of their appearance in the
actual user workflow.
### Blueprints
-**A blueprint is the plan that covers all the work to get your raw data ready
for query and metric computaion in the dashboards.** Creating a blueprint
consists of four steps:
+**A blueprint is the plan that covers all the work to get your raw data ready
for query and metric computation in the dashboards.** Creating a blueprint
consists of four steps:
1. **Adding [Data Connections](07-Glossary.md#data-connections)**: For each
[data source](07-Glossary.md#data-sources), one or more data connections can be
added to a single blueprint, depending on the data you want to sync to DevLake.
-2. **Setting the [Data Scope](07-Glossary.md#data-scope)**: For each data
connection, you need to configure the scope of data, such as GitHub projects,
Jira boards, and their corresponding [data
entities](07-Glossary.md#data-entities).
-3. **Adding [Transformation Rules](07-Glossary.md#transformation-rules)
(optional)**: You can optionally apply transformation for the data scope you
have just selected, in order to view more advanced metrics.
+2. **Setting the [Data Scope](07-Glossary.md#data-scope)**: For each data
connection, you need to configure the scope of data, such as GitHub projects,
Jira boards, and their corresponding [data
entities](07-Glossary.md#data-entities).
+3. **Adding [Transformation Rules](07-Glossary.md#transformation-rules)
(optional)**: You can optionally apply transformation for the data scope you
have just selected, in order to view more advanced metrics.
3. **Setting the Sync Frequency**: You can specify the sync frequency for your
blueprint to achieve recurring data syncs and transformation. Alternatively,
you can set the frequency to manual if you wish to run the tasks in the
blueprint manually.
The relationship among Blueprint, Data Connections, Data Scope and
Transformation Rules is explained as follows:

-- Each blueprint can have multiple data connections.
+- Each blueprint can have multiple data connections.
- Each data connection can have multiple sets of data scope.
- Each set of data scope only consists of one GitHub/GitLab project or Jira
board, along with their corresponding data entities.
- Each set of data scope can only have one set of transformation rules.
### Data Sources
-**A data source is a specific DevOps tool from which you wish to sync your
data, such as GitHub, GitLab, Jira and Jenkins.**
+**A data source is a specific DevOps tool from which you wish to sync your
data, such as GitHub, GitLab, Jira and Jenkins.**
DevLake normally uses one [data plugin](07-Glossary.md#data-plugins) to pull
data for a single data source. However, in some cases, DevLake uses multiple
data plugins for one data source for the purpose of improved sync speed, among
many other advantages. For instance, when you pull data from GitHub or GitLab,
aside from the GitHub or GitLab plugin, Git Extractor is also used to pull data
from the repositories. In this case, DevLake still refers GitHub or GitLab as a
single data source.
### Data Connections
-**A data connection is a specific instance of a data source that stores
information such as `endpoint` and `auth`.** A single data source can have one
or more data connections (e.g. two Jira instances). Currently, DevLake supports
one data connection for GitHub, GitLab and Jenkins, and multiple connections
for Jira.
+**A data connection is a specific instance of a data source that stores
information such as `endpoint` and `auth`.** A single data source can have one
or more data connections (e.g. two Jira instances). Currently, DevLake supports
one data connection for GitHub, GitLab and Jenkins, and multiple connections
for Jira.
-You can set up a new data connection either during the first step of creating
a blueprint, or in the Connections page that can be accessed from the
navigation bar. Because one single data connection can be resued in multiple
blueprints, you can update the information of a particular data connection in
Connections, to ensure all its associated blueprints will run properly. For
example, you may want to update your GitHub token in a data connection if it
goes expired.
+You can set up a new data connection either during the first step of creating
a blueprint, or in the Connections page that can be accessed from the
navigation bar. Because one single data connection can be reused in multiple
blueprints, you can update the information of a particular data connection in
Connections, to ensure all its associated blueprints will run properly. For
example, you may want to update your GitHub token in a data connection if it
goes expired.
### Data Scope
**In a blueprint, each data connection can have multiple sets of data scope
configurations, including GitHub or GitLab projects, Jira boards and their
corresponding[data entities](07-Glossary.md#data-entities).** The fields for
data scope configuration vary according to different data sources.
-Each set of data scope refers to one GitHub or GitLab project, or one Jira
board and the data entities you would like to sync for them, for the
convinience of applying transformation in the next step. For instance, if you
wish to sync 5 GitHub projects, you will have 5 sets of data scope for GitHub.
+Each set of data scope refers to one GitHub or GitLab project, or one Jira
board and the data entities you would like to sync for them, for the
convenience of applying transformation in the next step. For instance, if you
wish to sync 5 GitHub projects, you will have 5 sets of data scope for GitHub.
To learn more about the default data scope of all data sources and data
plugins, please refer to [Data Support](./05-DataModels/02-DataSupport.md).
### Data Entities
-**Data entities refer to the data fields from one of the five data domains:
Issue Tracking, Source Code Management, Code Review, CI/CD and Cross-Domain.**
+**Data entities refer to the data fields from one of the five data domains:
Issue Tracking, Source Code Management, Code Review, CI/CD and Cross-Domain.**
For instance, if you wish to pull Source Code Management data from GitHub and
Issue Tracking data from Jira, you can check the corresponding data entities
during setting the data scope of these two data connections.
To learn more details, please refer to [Domain Layer
Schema](./05-DataModels/01-DevLakeDomainLayerSchema.md).
### Transformation Rules
-**Transformation rules are a collection of methods that allow you to customize
how DevLake normalizes raw data for query and metric computation.** Each set of
data scope is strictly acompanied with one set of transformation rules.
However, for your convenience, transformation rules can also be duplicated
across different sets of data scope.
+**Transformation rules are a collection of methods that allow you to customize
how DevLake normalizes raw data for query and metric computation.** Each set of
data scope is strictly accompanied with one set of transformation rules.
However, for your convenience, transformation rules can also be duplicated
across different sets of data scope.
-DevLake uses these normalized values in the transformtion to design more
advanced dashboards, such as the Weekly Bug Retro dashboard. Although
configuring transformation rules is not mandatory, if you leave the rules blank
or have not configured correctly, only the basic dashboards (e.g. GitHub Basic
Metrics) will be displayed as expected, while the advanced dashboards will not.
+DevLake uses these normalized values in the transformation to design more
advanced dashboards, such as the Weekly Bug Retro dashboard. Although
configuring transformation rules is not mandatory, if you leave the rules blank
or have not configured correctly, only the basic dashboards (e.g. GitHub Basic
Metrics) will be displayed as expected, while the advanced dashboards will not.
### Historical Runs
-**A historical run of a blueprint is an actual excecution of the data
collection and transformation [tasks](07-Glossary.md#tasks) defined in the
blueprint at its creation.** A list of historical runs of a blueprint is the
entire running history of that blueprint, whether excecuted automatically or
manually. Historical runs can be triggered in three ways:
+**A historical run of a blueprint is an actual execution of the data
collection and transformation [tasks](07-Glossary.md#tasks) defined in the
blueprint at its creation.** A list of historical runs of a blueprint is the
entire running history of that blueprint, whether executed automatically or
manually. Historical runs can be triggered in three ways:
- By the blueprint automatically according to its schedule in the Regular Mode
of the Configuration UI
- By running the JSON in the Advanced Mode of the Configuration UI
- By calling the API `/pipelines` endpoint manually
@@ -85,9 +85,9 @@ For detailed information about the relationship between data
sources and data pl
### Pipelines
-**A pipeline is an orchestration of [tasks](07-Glossary.md#tasks) of data
`collection`, `extraction`, `conversion` and `enrichment`, defined in the
DevLake API.** A pipeline is composed of one or multiple
[stages](07-Glossary.md#stages) that are executed in a sequential order. Any
error occured during the execution of any stage, task or substask will cause
the immediate fail of the pipeline.
+**A pipeline is an orchestration of [tasks](07-Glossary.md#tasks) of data
`collection`, `extraction`, `conversion` and `enrichment`, defined in the
DevLake API.** A pipeline is composed of one or multiple
[stages](07-Glossary.md#stages) that are executed in a sequential order. Any
error occurring during the execution of any stage, task or subtask will cause
the immediate fail of the pipeline.
-The composition of a pipeline is exaplined as follows:
+The composition of a pipeline is explained as follows:

Notice: **You can manually orchestrate the pipeline in Configuration UI
Advanced Mode and the DevLake API; whereas in Configuration UI regular mode, an
optimized pipeline orchestration will be automatically generated for you.**
@@ -95,7 +95,7 @@ Notice: **You can manually orchestrate the pipeline in
Configuration UI Advanced
**A stages is a collection of tasks performed by data plugins.** Stages are
executed in a sequential order in a pipeline.
### Tasks
-**A task is a collection of [subtasks](07-Glossary.md#subtasks) that perform
any of the `collection`, `extraction`, `conversion` and `enrichment` jobs of a
particular data plugin.** Tasks are executed in a parralel order in any stages.
+**A task is a collection of [subtasks](07-Glossary.md#subtasks) that perform
any of the `collection`, `extraction`, `conversion` and `enrichment` jobs of a
particular data plugin.** Tasks are executed in a parallel order in any stages.
### Subtasks
**A subtask is the minimal work unit in a pipeline that performs in any of the
four roles: `Collectors`, `Extractors`, `Converters` and `Enrichers`.**
Subtasks are executed in sequential orders.