RangePartitioner? At least for join, you can implement your own partitioner, to utilize the sorted data. Just my 2 cents. Date: Wed, 11 Mar 2015 17:38:04 -0400 Subject: can spark take advantage of ordered data? From: jcove...@gmail.com To: User@spark.apache.org
Hello all, I am wondering if spark already has support for optimizations on sorted data and/or if such support could be added (I am comfortable dropping to a lower level if necessary to implement this, but I'm not sure if it is possible at all). Context: we have a number of data sets which are essentially already sorted on a key. With our current systems, we can take advantage of this to do a lot of analysis in a very efficient fashion...merges and joins, for example, can be done very efficiently, as can folds on a secondary key and so on. I was wondering if spark would be a fit for implementing these sorts of optimizations? Obviously it is sort of a niche case, but would this be achievable? Any pointers on where I should look?