[incubator-sdap-ingester] branch dev updated: SDAP-297: Update Collections Config docs to match latest schema (#26)

eamonford Mon, 07 Dec 2020 14:22:32 -0800

This is an automated email from the ASF dual-hosted git repository.

eamonford pushed a commit to branch dev
in repository https://gitbox.apache.org/repos/asf/incubator-sdap-ingester.git



The following commit(s) were added to refs/heads/dev by this push:
     new 575f25f  SDAP-297: Update Collections Config docs to match latest 
schema (#26)
575f25f is described below

commit 575f25f4563fcf8b04cb1f98e326486ac864d601
Author: Eamon Ford <[email protected]>
AuthorDate: Mon Dec 7 14:22:15 2020 -0800

    SDAP-297: Update Collections Config docs to match latest schema (#26)
---
 collection_manager/README.md | 81 ++++++++++++++++++++++++++++++++------------
 1 file changed, 60 insertions(+), 21 deletions(-)

diff --git a/collection_manager/README.md b/collection_manager/README.md
index 84df468..90e72fa 100644
--- a/collection_manager/README.md
+++ b/collection_manager/README.md
@@ -26,7 +26,7 @@ From `incubator-sdap-ingester`, run:
 
 A path to a collections configuration file must be passed in to the Collection 
Manager
 at startup via the `--collections-path` parameter. Below is an example of what 
the 
-collections configuration file should look like:
+collections configuration file could look like:
 
 ```yaml
 # collections.yaml
@@ -34,35 +34,74 @@ collections configuration file should look like:
 collections:
 
     # The identifier for the dataset as it will appear in NEXUS.
-  - id: TELLUS_GRACE_MASCON_CRI_GRID_RL05_V2_LAND 
+  - id: "CSR-RL06-Mascons_LAND"
 
-    # The local path to watch for NetCDF granule files to be associated with 
this dataset. 
-    # Supports glob-style patterns.
-    path: /opt/data/grace/*land*.nc 
-
-    # The name of the NetCDF variable to read when ingesting granules into 
NEXUS for this dataset.
-    variable: lwe_thickness 
+    # The path to watch for NetCDF granule files to be associated with this 
dataset. 
+    # This can also be an S3 path prefix, for example 
"s3://my-bucket/path/to/granules/"
+    path: "/data/CSR-RL06-Mascons-land/" 
 
     # An integer priority level to use when publishing messages to RabbitMQ 
for historical data. 
-    # Higher number = higher priority.
-    priority: 1 
+    # Higher number = higher priority. Scale is 1-10.
+    priority: 1
 
     # An integer priority level to use when publishing messages to RabbitMQ 
for forward-processing data.
-    # Higher number = higher priority.
+    # Higher number = higher priority. Scale is 1-10.
     forward-processing-priority: 5 
 
-  - id: TELLUS_GRACE_MASCON_CRI_GRID_RL05_V2_OCEAN
-    path: /opt/data/grace/*ocean*.nc
-    variable: lwe_thickness
-    priority: 2
-    forward-processing-priority: 6
+    # The type of project to use when processing granules in this collection.
+    # Accepted values are Grid, ECCO, TimeSeries, or Swath.
+    projection: Grid
+
+    dimensionNames:
+      # The name of the primary variable
+      variable: lwe_thickness
+
+      # The name of the latitude variable
+      latitude: lat
+
+      # The name of the longitude variable
+      longitude: lon
+
+      # The name of the depth variable (only include if depth variable exists)
+      depth: Z 
+      
+      # The name of the time variable (only include if time variable exists)
+      time: Time
+
+    # This section is an index of each dimension on which the primary variable 
is dependent, mapped to their desired slice sizes.
+    slices:
+      Z: 1 
+      Time: 1
+      lat: 60
+      lon: 60
+
+ - id: ocean-bottom-pressure 
+    path: /data/OBP/
+    priority: 6
+    forward-processing-priority: 7
+    projection: ECCO
+    dimensionNames:
+      latitude: YC
+      longitude: XC
+      time: time
+      # "tile" is required when using the ECCO projection. This refers to the 
name of the dimension containing the ECCO tile index.
+      tile: tile
+      variable: OBP
+    slices:
+      time: 1
+      tile: 1
+      i: 30
+      j: 30
+```
+
+Note that the dimensions listed under `slices` will not necessarily match the 
values of the properties under `dimensionNames`. This is because sometimes
+the actual dimensions are referenced by index variables. 
 
-  - id: AVHRR_OI-NCEI-L4-GLOB-v2.0
-    path: /opt/data/avhrr/*.nc
-    variable: analysed_sst
-    priority: 1
+> **Tip:** An easy way to determine which variables go under `dimensionNames` 
and which ones go under `slices` is that the variables 
+> on which the primary variable is dependent should be listed under `slices`, 
and the variables on which _those_ variables are dependent 
+> (which could be themselves, as in the case of the first collection in the 
above example) should be the values of the properties under 
+> `dimensionNames`. The excepction to this is that `dimensionNames.variable` 
should always be the name of the primary variable.
 
-```
 ## Running the tests
 From `incubator-sdap-ingester/`, run:

[incubator-sdap-ingester] branch dev updated: SDAP-297: Update Collections Config docs to match latest schema (#26)

Reply via email to