I’m not sure where to post this since its a bit of a philosophical question in terms of design and vision for spark.
If we look at SparkSQL and performance… where does Secondary indexing fit in? The reason this is a bit awkward is that if you view Spark as querying RDDs which are temporary, indexing doesn’t make sense until you consider your use case and how long is ‘temporary’. Then if you consider your RDD result set could be based on querying tables… and you could end up with an inverted table as an index… then indexing could make sense. Does it make sense to discuss this in user or dev email lists? Has anyone given this any thought in the past? Thx -Mike --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org