Re: LIMIT with offset in SQL queries

2014-07-04 Thread Mayur Rustagi
What I typically do is use row_number  subquery to filter based on that.
It works out pretty well, reduces the iteration. I think a offset solution
based on windowsing directly would be useful.

Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi https://twitter.com/mayur_rustagi



On Fri, Jul 4, 2014 at 2:00 AM, Michael Armbrust mich...@databricks.com
wrote:

 Doing an offset is actually pretty expensive in a distributed query
 engine, so in many cases it probably makes sense to just collect and then
 perform the offset as you are doing now.  This is unless the offset is very
 large.

 Another limitation here is that HiveQL does not support OFFSET.  That said
 if you want to open a JIRA we would consider implementing it.


 On Wed, Jul 2, 2014 at 1:37 PM, durin m...@simon-schaefer.net wrote:

 Hi,

 in many SQL-DBMS like MySQL, you can set an offset for the LIMIT clause,
 s.t. /LIMIT 5, 10/ will return 10 rows, starting from row 5.

 As far as I can see, this is not possible in Spark-SQL.
 The best solution I have to imitate that (using Scala) is converting the
 RDD
 into an Array via collect() and then using a for-loop to return certain
 elements from that Array.




 Is there a better solution regarding performance and are there plans to
 implement an offset for LIMIT?


 Kind regards,
 Simon



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/LIMIT-with-offset-in-SQL-queries-tp8673.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.





Re: LIMIT with offset in SQL queries

2014-07-04 Thread Michael Armbrust
Though I'll note that window functions are not yet supported in Spark SQL.
https://issues.apache.org/jira/browse/SPARK-1442


On Fri, Jul 4, 2014 at 6:59 AM, Mayur Rustagi mayur.rust...@gmail.com
wrote:

 What I typically do is use row_number  subquery to filter based on that.
 It works out pretty well, reduces the iteration. I think a offset solution
 based on windowsing directly would be useful.

 Mayur Rustagi
 Ph: +1 (760) 203 3257
 http://www.sigmoidanalytics.com
 @mayur_rustagi https://twitter.com/mayur_rustagi



 On Fri, Jul 4, 2014 at 2:00 AM, Michael Armbrust mich...@databricks.com
 wrote:

 Doing an offset is actually pretty expensive in a distributed query
 engine, so in many cases it probably makes sense to just collect and then
 perform the offset as you are doing now.  This is unless the offset is very
 large.

 Another limitation here is that HiveQL does not support OFFSET.  That
 said if you want to open a JIRA we would consider implementing it.


 On Wed, Jul 2, 2014 at 1:37 PM, durin m...@simon-schaefer.net wrote:

 Hi,

 in many SQL-DBMS like MySQL, you can set an offset for the LIMIT clause,
 s.t. /LIMIT 5, 10/ will return 10 rows, starting from row 5.

 As far as I can see, this is not possible in Spark-SQL.
 The best solution I have to imitate that (using Scala) is converting the
 RDD
 into an Array via collect() and then using a for-loop to return certain
 elements from that Array.




 Is there a better solution regarding performance and are there plans to
 implement an offset for LIMIT?


 Kind regards,
 Simon



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/LIMIT-with-offset-in-SQL-queries-tp8673.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.






Re: LIMIT with offset in SQL queries

2014-07-03 Thread Michael Armbrust
Doing an offset is actually pretty expensive in a distributed query engine,
so in many cases it probably makes sense to just collect and then perform
the offset as you are doing now.  This is unless the offset is very large.

Another limitation here is that HiveQL does not support OFFSET.  That said
if you want to open a JIRA we would consider implementing it.


On Wed, Jul 2, 2014 at 1:37 PM, durin m...@simon-schaefer.net wrote:

 Hi,

 in many SQL-DBMS like MySQL, you can set an offset for the LIMIT clause,
 s.t. /LIMIT 5, 10/ will return 10 rows, starting from row 5.

 As far as I can see, this is not possible in Spark-SQL.
 The best solution I have to imitate that (using Scala) is converting the
 RDD
 into an Array via collect() and then using a for-loop to return certain
 elements from that Array.




 Is there a better solution regarding performance and are there plans to
 implement an offset for LIMIT?


 Kind regards,
 Simon



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/LIMIT-with-offset-in-SQL-queries-tp8673.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.



LIMIT with offset in SQL queries

2014-07-02 Thread durin
Hi, 

in many SQL-DBMS like MySQL, you can set an offset for the LIMIT clause,
s.t. /LIMIT 5, 10/ will return 10 rows, starting from row 5.

As far as I can see, this is not possible in Spark-SQL.
The best solution I have to imitate that (using Scala) is converting the RDD
into an Array via collect() and then using a for-loop to return certain
elements from that Array.




Is there a better solution regarding performance and are there plans to
implement an offset for LIMIT?


Kind regards,
Simon



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/LIMIT-with-offset-in-SQL-queries-tp8673.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.