[ https://issues.apache.org/jira/browse/SDAP-472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Riley Kuttruff updated SDAP-472: -------------------------------- Status: In Progress (was: To Do) > General Zarr support for gridded datasets > ----------------------------------------- > > Key: SDAP-472 > URL: https://issues.apache.org/jira/browse/SDAP-472 > Project: Apache Science Data Analytics Platform > Issue Type: New Feature > Components: analysis, collection-ingester > Reporter: Riley Kuttruff > Assignee: Riley Kuttruff > Priority: Major > > End goal would be SDAP being able to onboard existing Zarr datasets with > minimal to no interaction with the data (ie, no scanning the data for > metadata generation). Gridded formats allow for this, with only the need to > record some (additional) dataset-level metadata. Swath data will require a > different and much more labor-intensive approach, so we should just focus on > gridded data as it will likely be more commonly used by our users. > > Collections should be able to be specified in the collection config yaml. > Currently we should implement zarr available in an S3 bucket and the local > filesystem; however, we should leave the door open for other storage options > (explicitly set in CC or determined by URL) - essentially zarr plugins we can > add in the future: > > {code:yaml} > collections: > - id: zarr_example_ds_s3 # Zarr array in S3; need to give creds > store-type: zarr > path: s3://sdap-zarr-bucket/zarr_example_ds > priority: 5 > forward-processing-priority: 5 > projection: Grid > dimensionNames: > latitude: lat > longitude: lon > time: time > variable: analysed_sst > slices: > lat: 100 > lon: 100 > time: 1 > aws: > accessKeyID: <id> > secretAccessKey: <id> > public: false > - id: zarr_example_ds_local # Zarr array in local fs > store-type: zarr > path: file:///data/zarr_example_ds_local > priority: 5 > forward-processing-priority: 5 > projection: Grid > dimensionNames: > latitude: lat > longitude: lon > time: time > variable: analysed_sst > slices: > lat: 100 > lon: 100 > time: 1 > - id: AVHRR_OI_L4_GHRSST_NCEI # Standard ingest to tiles in Cassandra > store-type: nexusproto > path: /data/granules/*.nc > priority: 10 > forward-processing-priority: 10 > projection: Grid > dimensionNames: > latitude: lat > longitude: lon > time: time > variable: analysed_sst > slices: > lat: 100 > lon: 100 > time: 1{code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)