[jira] [Commented] (SDAP-317) Open multiple netcdf files in order to generate granules with multiple time steps

Antoine Queric (Jira) Fri, 02 Jul 2021 00:33:15 -0700


    [ 
https://issues.apache.org/jira/browse/SDAP-317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373294#comment-17373294
 ]


Antoine Queric commented on SDAP-317:
-------------------------------------

Dear [~tloubrieu] , [~skperez] , [~thuang], [~nchung] (others ?)

I think this may be an interesting conversation before deciding whether we 
should or should not try to implement such a feature.

 

Our aim is to be able to concatenate multiple time ranges into one tile (also 
geo-spatially sliced) from daily netcdf files (only one TIME step inside). We 
are interested into trying that because we thought that may enhance long 
timeseries queries on the nexus webapp (yet to be proved).

Below is the directory structure of the dataset we are working with :

```

2015
├── 001
│   └── 20150101-IFR-L4_GHRSST-SSTfnd-ODYSSEA-GLOB_010-v2.0-fv1.0.nc
├── 002
│   └── 20150102-IFR-L4_GHRSST-SSTfnd-ODYSSEA-GLOB_010-v2.0-fv1.0.nc
├── 003
│   └── 20150103-IFR-L4_GHRSST-SSTfnd-ODYSSEA-GLOB_010-v2.0-fv1.0.nc
├── 004
│   └── 20150104-IFR-L4_GHRSST-SSTfnd-ODYSSEA-GLOB_010-v2.0-fv1.0.nc
├── 005
│   └── 20150105-IFR-L4_GHRSST-SSTfnd-ODYSSEA-GLOB_010-v2.0-fv1.0.nc
├── 006
│   └── 20150106-IFR-L4_GHRSST-SSTfnd-ODYSSEA-GLOB_010-v2.0-fv1.0.nc
├── 007
│   └── 20150107-IFR-L4_GHRSST-SSTfnd-ODYSSEA-GLOB_010-v2.0-fv1.0.nc
├── 008
│   └── 20150108-IFR-L4_GHRSST-SSTfnd-ODYSSEA-GLOB_010-v2.0-fv1.0.nc
├── 009
│   └── 20150109-IFR-L4_GHRSST-SSTfnd-ODYSSEA-GLOB_010-v2.0-fv1.0.nc

```

Headers chunk for the dimensions size :

```

netcdf \20150101-IFR-L4_GHRSST-SSTfnd-ODYSSEA-GLOB_010-v2.0-fv1.0 {
dimensions:
 lat = 1600 ;
 lon = 3600 ;
 time = 1 ;

```

First of all, is this something we really want to add in SDAP ingester ? I'm 
not sure re-processing the netcdf files in order to re-chunk on time dimension 
is something we will do in some cases, so my first guess is that we would want 
granule_ingester to do that for us.

 

If we wanted this, how to handle the collection_manager for it to properly 
queue a specific number of contiguous files that one granule_ingester intsance 
will take care of ?

 

Maybe the more simple way to handle that would simply be to (for this speciic 
need) bypass the collection_manager & launch a granule_ingester instance with a 
list of files & configuration to process ?

 

Maybe I'm missing something, we started evaluating the new SDAP ingester module 
this week (and already comitting code for elasticsearch support) ; maybe what 
we want to do is alreadyy possible & I missed that :)

 

Best regards,

Antoine

> Open multiple netcdf files in order to generate granules with multiple time 
> steps
> ---------------------------------------------------------------------------------
>
>                 Key: SDAP-317
>                 URL: https://issues.apache.org/jira/browse/SDAP-317
>             Project: Apache Science Data Analytics Platform
>          Issue Type: Improvement
>          Components: granule-ingester
>            Reporter: Antoine Queric
>            Priority: Major
>
> When netcdf files only include one single time step, it may be interesting to 
> open multiple files & generate a data cube which contains :
>  * longitude slice
>  * latitude slice
>  * time slice
> We will develop & test such a feature in order to compare performance when 
> querying long timeseries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (SDAP-317) Open multiple netcdf files in order to generate granules with multiple time steps

Reply via email to