Re: parquet partition discovery

2015-04-08 Thread Michael Armbrust
Back to the user list so everyone can see the result of the discussion... Ah. It all makes sense now. The issue is that when I created the parquet files, I included an unnecessary directory name (data.parquet) below the partition directories. It’s just a leftover from when I started with

Re: parquet partition discovery

2015-04-08 Thread Cheng Lian
On 4/9/15 3:09 AM, Michael Armbrust wrote: Back to the user list so everyone can see the result of the discussion... Ah. It all makes sense now. The issue is that when I created the parquet files, I included an unnecessary directory name (data.parquet) below the partition

parquet partition discovery

2015-04-07 Thread Christopher Petro
I was unable to get this feature to work in 1.3.0. I tried building off master and it still wasn't working for me. So I dug into the code, and I'm not sure how the parsePartition() was ever working. The while loop which walks up the parent directories in the path always terminates after a

Issue of sqlContext.createExternalTable with parquet partition discovery after changing folder structure

2015-04-04 Thread Rex Xiong
Hi Spark Users, I'm testing 1.3 new feature of parquet partition discovery. I have 2 sub folders, each has 800 rows. /data/table1/key=1 /data/table1/key=2 In spark-shell, run this command: val t = sqlContext.createExternalTable(table1, hdfs:///data/table1, parquet) t.count It shows 1600

Re: Issue of sqlContext.createExternalTable with parquet partition discovery after changing folder structure

2015-04-04 Thread Cheng Lian
You need to refresh the external table manually after updating the data source outside Spark SQL: - via Scala API: sqlContext.refreshTable(table1) - via SQL: REFRESH TABLE table1; Cheng On 4/4/15 5:24 PM, Rex Xiong wrote: Hi Spark Users, I'm testing 1.3 new feature of parquet partition