Back to the user list so everyone can see the result of the discussion...
Ah. It all makes sense now. The issue is that when I created the parquet
files, I included an unnecessary directory name (data.parquet) below the
partition directories. It’s just a leftover from when I started with
On 4/9/15 3:09 AM, Michael Armbrust wrote:
Back to the user list so everyone can see the result of the discussion...
Ah. It all makes sense now. The issue is that when I created the
parquet files, I included an unnecessary directory name
(data.parquet) below the partition
I was unable to get this feature to work in 1.3.0. I tried building off master
and it still wasn't working for me. So I dug into the code, and I'm not sure
how the parsePartition() was ever working. The while loop which walks up the
parent directories in the path always terminates after a
Hi Spark Users,
I'm testing 1.3 new feature of parquet partition discovery.
I have 2 sub folders, each has 800 rows.
/data/table1/key=1
/data/table1/key=2
In spark-shell, run this command:
val t = sqlContext.createExternalTable(table1, hdfs:///data/table1,
parquet)
t.count
It shows 1600
You need to refresh the external table manually after updating the data
source outside Spark SQL:
- via Scala API: sqlContext.refreshTable(table1)
- via SQL: REFRESH TABLE table1;
Cheng
On 4/4/15 5:24 PM, Rex Xiong wrote:
Hi Spark Users,
I'm testing 1.3 new feature of parquet partition