Re: [SPAM] Customized Aggregation Query on Spark SQL

2015-04-30 Thread Wenlei Xie
(): print i Result: Row(name=u'A', age=30, other=u'A30') Row(name=u'B', age=15, other=u'B15') Row(name=u'C', age=20, other=u'C200') On Sat, Apr 25, 2015 at 2:48 PM, Wenlei Xie wenlei@gmail.com wrote: Sure. A simple example of data would be (there might be many other columns) Name

Automatic Cache in SparkSQL

2015-04-27 Thread Wenlei Xie
Hi, I am trying to answer a simple query with SparkSQL over the Parquet file. When execute the query several times, the first run will take about 2s while the later run will take 0.1s. By looking at the log file it seems the later runs doesn't load the data from disk. However, I didn't enable

Re: Super slow caching in 1.3?

2015-04-27 Thread Wenlei Xie
...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- Wenlei Xie (谢文磊) Ph.D. Candidate Department of Computer Science 456 Gates Hall, Cornell University Ithaca, NY 14853, USA Email: wenlei@gmail.com

Understand the running time of SparkSQL queries

2015-04-26 Thread Wenlei Xie
Hi, I am wondering how should we understand the running time of SparkSQL queries? For example the physical query plan and the running time on each stage? Is there any guide talking about this? Thank you! Best, Wenlei

Re: Creating a Row in SparkSQL 1.2 from ArrayList

2015-04-24 Thread Wenlei Xie
Use Object[] in Java just works :). On Fri, Apr 24, 2015 at 4:56 PM, Wenlei Xie wenlei@gmail.com wrote: Hi, I am wondering if there is any way to create a Row in SparkSQL 1.2 in Java by using an List? It looks like ArrayListObject something; Row.create(something) will create a row

Re: Number of input partitions in SparkContext.sequenceFile

2015-04-24 Thread Wenlei Xie
. On Sat, Apr 18, 2015 at 6:20 PM, Wenlei Xie wenlei@gmail.com wrote: Hi, I am wondering the mechanism that determines the number of partitions created by SparkContext.sequenceFile ? For example, although my file has only 4 splits, Spark would create 16 partitions for it. Is it determined

Customized Aggregation Query on Spark SQL

2015-04-24 Thread Wenlei Xie
Hi, I would like to answer the following customized aggregation query on Spark SQL 1. Group the table by the value of Name 2. For each group, choose the tuple with the max value of Age (the ages are distinct for every name) I am wondering what's the best way to do it on Spark SQL? Should I use

Creating a Row in SparkSQL 1.2 from ArrayList

2015-04-24 Thread Wenlei Xie
Hi, I am wondering if there is any way to create a Row in SparkSQL 1.2 in Java by using an List? It looks like ArrayListObject something; Row.create(something) will create a row with single column (and the single column contains the array) Best, Wenlei

Number of input partitions in SparkContext.sequenceFile

2015-04-18 Thread Wenlei Xie
Hi, I am wondering the mechanism that determines the number of partitions created by SparkContext.sequenceFile ? For example, although my file has only 4 splits, Spark would create 16 partitions for it. Is it determined by the file size? Is there any way to control it? (Looks like I can only

CPU Usage for Spark Local Mode

2015-04-04 Thread Wenlei Xie
Hi, I am currently testing my application with Spark under local mode, and I set the master to be local[4]. One thing I note is that when there is groupBy/reduceBy operation involved, the CPU usage can sometimes be around 600% to 800%. I am wondering if this is expected? (As only 4 worker threads