This is an automated email from the ASF dual-hosted git repository.
eamonford pushed a commit to branch dev
in repository https://gitbox.apache.org/repos/asf/incubator-sdap-ingester.git
The following commit(s) were added to refs/heads/dev by this push:
new 575f25f SDAP-297: Update Collections Config docs to match latest
schema (#26)
575f25f is described below
commit 575f25f4563fcf8b04cb1f98e326486ac864d601
Author: Eamon Ford <[email protected]>
AuthorDate: Mon Dec 7 14:22:15 2020 -0800
SDAP-297: Update Collections Config docs to match latest schema (#26)
---
collection_manager/README.md | 81 ++++++++++++++++++++++++++++++++------------
1 file changed, 60 insertions(+), 21 deletions(-)
diff --git a/collection_manager/README.md b/collection_manager/README.md
index 84df468..90e72fa 100644
--- a/collection_manager/README.md
+++ b/collection_manager/README.md
@@ -26,7 +26,7 @@ From `incubator-sdap-ingester`, run:
A path to a collections configuration file must be passed in to the Collection
Manager
at startup via the `--collections-path` parameter. Below is an example of what
the
-collections configuration file should look like:
+collections configuration file could look like:
```yaml
# collections.yaml
@@ -34,35 +34,74 @@ collections configuration file should look like:
collections:
# The identifier for the dataset as it will appear in NEXUS.
- - id: TELLUS_GRACE_MASCON_CRI_GRID_RL05_V2_LAND
+ - id: "CSR-RL06-Mascons_LAND"
- # The local path to watch for NetCDF granule files to be associated with
this dataset.
- # Supports glob-style patterns.
- path: /opt/data/grace/*land*.nc
-
- # The name of the NetCDF variable to read when ingesting granules into
NEXUS for this dataset.
- variable: lwe_thickness
+ # The path to watch for NetCDF granule files to be associated with this
dataset.
+ # This can also be an S3 path prefix, for example
"s3://my-bucket/path/to/granules/"
+ path: "/data/CSR-RL06-Mascons-land/"
# An integer priority level to use when publishing messages to RabbitMQ
for historical data.
- # Higher number = higher priority.
- priority: 1
+ # Higher number = higher priority. Scale is 1-10.
+ priority: 1
# An integer priority level to use when publishing messages to RabbitMQ
for forward-processing data.
- # Higher number = higher priority.
+ # Higher number = higher priority. Scale is 1-10.
forward-processing-priority: 5
- - id: TELLUS_GRACE_MASCON_CRI_GRID_RL05_V2_OCEAN
- path: /opt/data/grace/*ocean*.nc
- variable: lwe_thickness
- priority: 2
- forward-processing-priority: 6
+ # The type of project to use when processing granules in this collection.
+ # Accepted values are Grid, ECCO, TimeSeries, or Swath.
+ projection: Grid
+
+ dimensionNames:
+ # The name of the primary variable
+ variable: lwe_thickness
+
+ # The name of the latitude variable
+ latitude: lat
+
+ # The name of the longitude variable
+ longitude: lon
+
+ # The name of the depth variable (only include if depth variable exists)
+ depth: Z
+
+ # The name of the time variable (only include if time variable exists)
+ time: Time
+
+ # This section is an index of each dimension on which the primary variable
is dependent, mapped to their desired slice sizes.
+ slices:
+ Z: 1
+ Time: 1
+ lat: 60
+ lon: 60
+
+ - id: ocean-bottom-pressure
+ path: /data/OBP/
+ priority: 6
+ forward-processing-priority: 7
+ projection: ECCO
+ dimensionNames:
+ latitude: YC
+ longitude: XC
+ time: time
+ # "tile" is required when using the ECCO projection. This refers to the
name of the dimension containing the ECCO tile index.
+ tile: tile
+ variable: OBP
+ slices:
+ time: 1
+ tile: 1
+ i: 30
+ j: 30
+```
+
+Note that the dimensions listed under `slices` will not necessarily match the
values of the properties under `dimensionNames`. This is because sometimes
+the actual dimensions are referenced by index variables.
- - id: AVHRR_OI-NCEI-L4-GLOB-v2.0
- path: /opt/data/avhrr/*.nc
- variable: analysed_sst
- priority: 1
+> **Tip:** An easy way to determine which variables go under `dimensionNames`
and which ones go under `slices` is that the variables
+> on which the primary variable is dependent should be listed under `slices`,
and the variables on which _those_ variables are dependent
+> (which could be themselves, as in the case of the first collection in the
above example) should be the values of the properties under
+> `dimensionNames`. The excepction to this is that `dimensionNames.variable`
should always be the name of the primary variable.
-```
## Running the tests
From `incubator-sdap-ingester/`, run: