]: spark.read.parquet(*subdirs[:31]).schema.jsonValue() ==
spark.read.parquet(*subdirs[1:32]).schema.jsonValue()
Out[70]: True
Any idea why that might be happening?
On Tue, Aug 9, 2016 at 12:12 PM, immerrr again <imme...@gmail.com> wrote:
> Some follow-up information:
>
> - datase
260], [IN,22404143], [US,98585175])
scala> counts.slice(0, 10)
res14: Array[org.apache.spark.sql.Row] = Array([UM,1], [JB,1], [JK,1],
[WP,1], [JT,1], [SX,9], [BL,52], [BQ,70], [BV,115], [MF,115])
On Tue, Aug 9, 2016 at 11:10 AM, immerrr again <imme...@gmail.com> wrote:
> Hi everyo
Hi everyone
I tried upgrading Spark-1.6.2 to Spark-2.0.0 but run into an issue
reading the existing data. Here's how the traceback looks in
spark-shell:
scala> spark.read.parquet("/path/to/data")
org.apache.spark.sql.AnalysisException: Unable to infer schema for
ParquetFormat at /path/to/data.
Hi all!
I'm having a strange issue with pyspark 1.6.1. I have a dataframe,
df = sqlContext.read.parquet('/path/to/data')
whose "df.take(10)" is really slow, apparently scanning the whole
dataset to take the first ten rows. "df.first()" works fast, as does
"df.rdd.take(10)".
I have found