[ https://issues.apache.org/jira/browse/SDAP-440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Riley Kuttruff updated SDAP-440: -------------------------------- Description: In our current design, when tiles are loaded the data is formatted to be shaped like gridded data (L3/L4). This is obviously fine for L4/L3 tiles. The problem is with L2 (swath) tiles. The swath -> grid-like transform requires transforming an m x n data array for the L2 tile to an (m * n) x (m * n) array with the original data values occupying the diagonal of the array and the rest of the array locations unused. It goes without saying that this is EXTREMELY inefficient for memory, and L2 tile sizes >15x15 can very easily consume memory by the gigabyte. Proposed solution: Instead of handling loaded tiles in gridded format, handle them in swath format. This would remove the issues from having to transform L2 tiles, but would still require expanding the latitude and longitude (and time?) arrays to match shape with the data array. This would require SIGNIFICANTLY less extra memory to do (I even believe numpy can do it with constant extra memory rather than the expected O(n)). The problem with this is that we would need to individually adapt each of SDAP's algorithms to work with swath formatted data rather than grid formatted data. The scale of this required change has caused us to hold off on this implementation. Plan: I plan to mitigate that issue by adapting the NexusTileService to be (temporarily) configurable to allow choice in how returned tile data is formatted (default will be gridded). We can then roll out the changes for the various algorithms and switch over their NTS to serve swath data. Upon completion of the rollout, we can (optionally) remove the configuration option from the NTS or switch its default to swath. was: In our current design, when tiles are loaded the data is formatted to be shaped like gridded data (L3/L4). This is obviously fine for L4/L3 tiles. The problem is with L2 (swath) tiles. The swath -> grid-like transform requires transforming an m x n data array for the L2 tile to an (m * n) x (m * n) array with the original data values occupying the diagonal of the array and the rest of the array locations unused. It goes without saying that this is EXTREMELY inefficient for memory, and L2 tile sizes >15x15 can very easily consume memory by the gigabyte. Proposed solution: Instead of handling loaded tiles in gridded format, handle them in swath format. This would remove the issues from having to transform L2 tiles, but would still require expanding the latitude and longitude (and time?) arrays to match shape with the data array. This would require SIGNIFICANTLY less extra memory to do (I even believe numpy can do it with constant extra memory rather than the expected O(n)). The problem with this is that we would need to individually adapt each of SDAP's algorithms to work with swath formatted data rather than grid formatted data. The scale of this required change has caused us to hold off on this implementation. Plan: I plan to mitigate that issue by adapting the NexusTileService to be (temporarily) configurable to allow choice in how returned tile data is formatted (default will be gridded). We can then roll out the changes for the various algorithms and switch over their NT > Switch handling of tile data to L2 format > ----------------------------------------- > > Key: SDAP-440 > URL: https://issues.apache.org/jira/browse/SDAP-440 > Project: Apache Science Data Analytics Platform > Issue Type: Task > Reporter: Riley Kuttruff > Priority: Major > > In our current design, when tiles are loaded the data is formatted to be > shaped like gridded data (L3/L4). This is obviously fine for L4/L3 tiles. The > problem is with L2 (swath) tiles. The swath -> grid-like transform requires > transforming an m x n data array for the L2 tile to an (m * n) x (m * n) > array with the original data values occupying the diagonal of the array and > the rest of the array locations unused. It goes without saying that this is > EXTREMELY inefficient for memory, and L2 tile sizes >15x15 can very easily > consume memory by the gigabyte. > > Proposed solution: Instead of handling loaded tiles in gridded format, handle > them in swath format. This would remove the issues from having to transform > L2 tiles, but would still require expanding the latitude and longitude (and > time?) arrays to match shape with the data array. This would require > SIGNIFICANTLY less extra memory to do (I even believe numpy can do it with > constant extra memory rather than the expected O(n)). > The problem with this is that we would need to individually adapt each of > SDAP's algorithms to work with swath formatted data rather than grid > formatted data. The scale of this required change has caused us to hold off > on this implementation. > > Plan: I plan to mitigate that issue by adapting the NexusTileService to be > (temporarily) configurable to allow choice in how returned tile data is > formatted (default will be gridded). We can then roll out the changes for the > various algorithms and switch over their NTS to serve swath data. Upon > completion of the rollout, we can (optionally) remove the configuration > option from the NTS or switch its default to swath. -- This message was sent by Atlassian Jira (v8.20.10#820010)