Re: LIMIT with offset in SQL queries
What I typically do is use row_number subquery to filter based on that. It works out pretty well, reduces the iteration. I think a offset solution based on windowsing directly would be useful. Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi https://twitter.com/mayur_rustagi On Fri, Jul 4, 2014 at 2:00 AM, Michael Armbrust mich...@databricks.com wrote: Doing an offset is actually pretty expensive in a distributed query engine, so in many cases it probably makes sense to just collect and then perform the offset as you are doing now. This is unless the offset is very large. Another limitation here is that HiveQL does not support OFFSET. That said if you want to open a JIRA we would consider implementing it. On Wed, Jul 2, 2014 at 1:37 PM, durin m...@simon-schaefer.net wrote: Hi, in many SQL-DBMS like MySQL, you can set an offset for the LIMIT clause, s.t. /LIMIT 5, 10/ will return 10 rows, starting from row 5. As far as I can see, this is not possible in Spark-SQL. The best solution I have to imitate that (using Scala) is converting the RDD into an Array via collect() and then using a for-loop to return certain elements from that Array. Is there a better solution regarding performance and are there plans to implement an offset for LIMIT? Kind regards, Simon -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/LIMIT-with-offset-in-SQL-queries-tp8673.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: LIMIT with offset in SQL queries
Though I'll note that window functions are not yet supported in Spark SQL. https://issues.apache.org/jira/browse/SPARK-1442 On Fri, Jul 4, 2014 at 6:59 AM, Mayur Rustagi mayur.rust...@gmail.com wrote: What I typically do is use row_number subquery to filter based on that. It works out pretty well, reduces the iteration. I think a offset solution based on windowsing directly would be useful. Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi https://twitter.com/mayur_rustagi On Fri, Jul 4, 2014 at 2:00 AM, Michael Armbrust mich...@databricks.com wrote: Doing an offset is actually pretty expensive in a distributed query engine, so in many cases it probably makes sense to just collect and then perform the offset as you are doing now. This is unless the offset is very large. Another limitation here is that HiveQL does not support OFFSET. That said if you want to open a JIRA we would consider implementing it. On Wed, Jul 2, 2014 at 1:37 PM, durin m...@simon-schaefer.net wrote: Hi, in many SQL-DBMS like MySQL, you can set an offset for the LIMIT clause, s.t. /LIMIT 5, 10/ will return 10 rows, starting from row 5. As far as I can see, this is not possible in Spark-SQL. The best solution I have to imitate that (using Scala) is converting the RDD into an Array via collect() and then using a for-loop to return certain elements from that Array. Is there a better solution regarding performance and are there plans to implement an offset for LIMIT? Kind regards, Simon -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/LIMIT-with-offset-in-SQL-queries-tp8673.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: LIMIT with offset in SQL queries
Doing an offset is actually pretty expensive in a distributed query engine, so in many cases it probably makes sense to just collect and then perform the offset as you are doing now. This is unless the offset is very large. Another limitation here is that HiveQL does not support OFFSET. That said if you want to open a JIRA we would consider implementing it. On Wed, Jul 2, 2014 at 1:37 PM, durin m...@simon-schaefer.net wrote: Hi, in many SQL-DBMS like MySQL, you can set an offset for the LIMIT clause, s.t. /LIMIT 5, 10/ will return 10 rows, starting from row 5. As far as I can see, this is not possible in Spark-SQL. The best solution I have to imitate that (using Scala) is converting the RDD into an Array via collect() and then using a for-loop to return certain elements from that Array. Is there a better solution regarding performance and are there plans to implement an offset for LIMIT? Kind regards, Simon -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/LIMIT-with-offset-in-SQL-queries-tp8673.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
LIMIT with offset in SQL queries
Hi, in many SQL-DBMS like MySQL, you can set an offset for the LIMIT clause, s.t. /LIMIT 5, 10/ will return 10 rows, starting from row 5. As far as I can see, this is not possible in Spark-SQL. The best solution I have to imitate that (using Scala) is converting the RDD into an Array via collect() and then using a for-loop to return certain elements from that Array. Is there a better solution regarding performance and are there plans to implement an offset for LIMIT? Kind regards, Simon -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/LIMIT-with-offset-in-SQL-queries-tp8673.html Sent from the Apache Spark User List mailing list archive at Nabble.com.