Hi all! I'm having a strange issue with pyspark 1.6.1. I have a dataframe,
df = sqlContext.read.parquet('/path/to/data') whose "df.take(10)" is really slow, apparently scanning the whole dataset to take the first ten rows. "df.first()" works fast, as does "df.rdd.take(10)". I have found https://issues.apache.org/jira/browse/SPARK-10731 that should have fixed it in 1.6.0, but it has not. What am i doing wrong here and how can I fix this? Cheers, immerrr --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org