RJ Nowling created SPARK-4727:
---------------------------------
Summary: Add "dimensional" RDDs (time series, spatial)
Key: SPARK-4727
URL: https://issues.apache.org/jira/browse/SPARK-4727
Project: Spark
Issue Type: Brainstorming
Components: Spark Core
Affects Versions: 1.1.0
Reporter: RJ Nowling
Certain types of data (times series, spatial) can benefit from specialized
RDDs. I'd like to open a discussion about this.
For example, time series data should be ordered by time and would benefit from
operations like:
* Subsampling (taking every n data points)
* Signal processing (correlations, FFTs, filtering)
* Windowing functions
Spatial data benefits from ordering and partitioning along a 2D or 3D grid.
For example, path finding algorithms can optimized by only comparing points
within a set distance, which can be computed more efficiently by partitioning
data into a grid.
Although the operations on time series and spatial data may be different, there
is some commonality in the sense of the data having ordered dimensions and the
implementations may overlap.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]