[jira] [Updated] (SDAP-472) General Zarr support for gridded datasets

Riley Kuttruff (Jira) Wed, 12 Jul 2023 16:47:53 -0700


     [ 
https://issues.apache.org/jira/browse/SDAP-472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Riley Kuttruff updated SDAP-472:
--------------------------------
    Status: In Progress  (was: To Do)

> General Zarr support for gridded datasets
> -----------------------------------------
>
>                 Key: SDAP-472
>                 URL: https://issues.apache.org/jira/browse/SDAP-472
>             Project: Apache Science Data Analytics Platform
>          Issue Type: New Feature
>          Components: analysis, collection-ingester
>            Reporter: Riley Kuttruff
>            Assignee: Riley Kuttruff
>            Priority: Major
>
> End goal would be SDAP being able to onboard existing Zarr datasets with 
> minimal to no interaction with the data (ie, no scanning the data for 
> metadata generation). Gridded formats allow for this, with only the need to 
> record some (additional) dataset-level metadata. Swath data will require a 
> different and much more labor-intensive approach, so we should just focus on 
> gridded data as it will likely be more commonly used by our users. 
>  
> Collections should be able to be specified in the collection config yaml. 
> Currently we should implement zarr available in an S3 bucket and the local 
> filesystem; however, we should leave the door open for other storage options 
> (explicitly set in CC or determined by URL) - essentially zarr plugins we can 
> add in the future: 
>  
> {code:yaml}
> collections:   
>   - id: zarr_example_ds_s3   # Zarr array in S3; need to give creds
>     store-type: zarr
>     path: s3://sdap-zarr-bucket/zarr_example_ds
>     priority: 5
>     forward-processing-priority: 5
>     projection: Grid
>     dimensionNames:       
>       latitude: lat
>       longitude: lon
>       time: time
>       variable: analysed_sst
>     slices:       
>       lat: 100
>       lon: 100
>       time: 1
>     aws:       
>       accessKeyID: <id>
>       secretAccessKey: <id>
>       public: false
>   - id: zarr_example_ds_local # Zarr array in local fs
>     store-type: zarr 
>     path: file:///data/zarr_example_ds_local
>     priority: 5
>     forward-processing-priority: 5
>     projection: Grid
>     dimensionNames: 
>       latitude: lat
>       longitude: lon
>       time: time
>       variable: analysed_sst
>     slices:  
>      lat: 100
>       lon: 100
>       time: 1
>   - id: AVHRR_OI_L4_GHRSST_NCEI # Standard ingest to tiles in Cassandra
>     store-type: nexusproto 
>     path: /data/granules/*.nc
>     priority: 10
>     forward-processing-priority: 10
>     projection: Grid
>     dimensionNames: 
>       latitude: lat
>       longitude: lon
>       time: time
>       variable: analysed_sst
>     slices:      
>       lat: 100
>       lon: 100
>       time: 1{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (SDAP-472) General Zarr support for gridded datasets

Reply via email to