uranusjr commented on code in PR #38687:
URL: https://github.com/apache/airflow/pull/38687#discussion_r1548779735


##########
docs/apache-airflow/authoring-and-scheduling/datasets.rst:
##########
@@ -51,38 +51,38 @@ In addition to scheduling DAGs based upon time, they can 
also be scheduled based
 What is a "dataset"?
 --------------------
 
-An Airflow dataset is a stand-in for a logical grouping of data. Datasets may 
be updated by upstream "producer" tasks, and dataset updates contribute to 
scheduling downstream "consumer" DAGs.
+An Airflow Dataset is a logical grouping of data. Upstream producer tasks can 
update datasets, and dataset updates contribute to scheduling downstream 
consumer DAGs.
 
-A dataset is defined by a Uniform Resource Identifier (URI):
+Uniform Resource Identifier (URI) define datasets:
 
 .. code-block:: python
 
     from airflow.datasets import Dataset
 
     example_dataset = Dataset("s3://dataset-bucket/example.csv")
 
-Airflow makes no assumptions about the content or location of the data 
represented by the URI. It is treated as a string, so any use of regular 
expressions (eg ``input_\d+.csv``) or file glob patterns (eg 
``input_2022*.csv``) as an attempt to create multiple datasets from one 
declaration will not work.
+Airflow makes no assumptions about the content or location of the data 
represented by the URI, and treats the URI like a string. This means that 
Airflow treats any regular expressions, like ``input_\d+.csv``, or file glob 
patterns, such as ``input_2022*.csv``, as an attempt to create multiple 
datasets from one declaration, and they will not work.
 
-A dataset should be created with a valid URI. Airflow core and providers 
define various URI schemes that you can use, such as ``file`` (core), 
``postgres`` (by the Postgres provider), and ``s3`` (by the Amazon provider). 
Third-party providers and plugins may also provide their own schemes. These 
pre-defined schemes have individual semantics that are expected to be followed.
+You must create datasets with a valid URI. Airflow core and providers define 
various URI schemes that you can use, such as ``file`` (core), ``postgres`` (by 
the Postgres provider), and ``s3`` (by the Amazon provider). Third-party 
providers and plugins might also provide their own schemes. These pre-defined 
schemes have individual semantics that are expected to be followed.
 
 What is valid URI?
 ------------------
 
-Technically, the URI must conform to the valid character set in RFC 3986. If 
you don't know what this means, that's basically ASCII alphanumeric characters, 
plus ``%``,  ``-``, ``_``, ``.``, and ``~``. To identify a resource that cannot 
be represented by URI-safe characters, encode the resource name with 
`percent-encoding <https://en.wikipedia.org/wiki/Percent-encoding>`_.
+Technically, the URI must conform to the valid character set in RFC 3986, 
which is basically ASCII alphanumeric characters, plus ``%``,  ``-``, ``_``, 
``.``, and ``~``. To identify a resource that cannot be represented by URI-safe 
characters, encode the resource name with `percent-encoding 
<https://en.wikipedia.org/wiki/Percent-encoding>`_.

Review Comment:
   We should probably add a link to the Wikipedia entry on URI somewhere too. 
https://en.wikipedia.org/wiki/Uniform_Resource_Identifier



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to