Dear Liang, Thanks for your valuable feedback.
There was a mistake in the previous post i corrected it, as you mentioned the `GlobalLimit` we will only take the required number of rows from the input iterator which really pulls data from local blocks and remote blocks. but if the limit value is very high >= 10000000, and when there will be a shuffle exchange happens between `GlobalLimit` and `LocalLimit` to retrieve data from all partitions to one partition, since the limit value is very large the performance bottleneck still exists. soon in next post i will publish a test report with sample data and also figuring out a solution for this problem. Please let me know for any clarifications or suggestions regarding this issue. Regards, Sujith -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Limit-Query-Performance-Suggestion-tp20570p20640.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: [email protected]
