Re: SparkSQL 'describe table' tries to look at all records

2015-07-13 Thread Yana Kadiyska
Have you seen https://issues.apache.org/jira/browse/SPARK-6910I opened https://issues.apache.org/jira/browse/SPARK-6984 which I think is related to this as well. There are a bunch of issues attached to it but basically yes, Spark interactions with a large metastore are bad...very bad if your

Re: SparkSQL 'describe table' tries to look at all records

2015-07-12 Thread Jerrick Hoang
Sorry all for not being clear. I'm using spark 1.4 and the table is a hive table, and the table is partitioned. On Sun, Jul 12, 2015 at 6:36 PM, Yin Huai yh...@databricks.com wrote: Jerrick, Let me ask a few clarification questions. What is the version of Spark? Is the table a hive table?

Re: SparkSQL 'describe table' tries to look at all records

2015-07-12 Thread Ted Yu
Which Spark release do you use ? Cheers On Sun, Jul 12, 2015 at 5:03 PM, Jerrick Hoang jerrickho...@gmail.com wrote: Hi all, I'm new to Spark and this question may be trivial or has already been answered, but when I do a 'describe table' from SparkSQL CLI it seems to try looking at all

Re: SparkSQL 'describe table' tries to look at all records

2015-07-12 Thread ayan guha
Describe computes statistics, so it will try to query the table. The one you are looking for is df.printSchema() On Mon, Jul 13, 2015 at 10:03 AM, Jerrick Hoang jerrickho...@gmail.com wrote: Hi all, I'm new to Spark and this question may be trivial or has already been answered, but when I do

Re: SparkSQL 'describe table' tries to look at all records

2015-07-12 Thread Yin Huai
Jerrick, Let me ask a few clarification questions. What is the version of Spark? Is the table a hive table? What is the format of the table? Is the table partitioned? Thanks, Yin On Sun, Jul 12, 2015 at 6:01 PM, ayan guha guha.a...@gmail.com wrote: Describe computes statistics, so it will

SparkSQL 'describe table' tries to look at all records

2015-07-12 Thread Jerrick Hoang
Hi all, I'm new to Spark and this question may be trivial or has already been answered, but when I do a 'describe table' from SparkSQL CLI it seems to try looking at all records at the table (which takes a really long time for big table) instead of just giving me the metadata of the table. Would