You are probably looking to do .select(explode($to), ...) first, which
will produce a new row for each value in the input array.
On Fri, Jun 19, 2015 at 12:02 AM, Suraj Shetiya surajshet...@gmail.com
wrote:
Hi,
I wanted to obtain a grouped by frame from a dataframe.
A snippet of the column
Thanks for reporting. Filed as:
https://issues.apache.org/jira/browse/SPARK-8470
On Thu, Jun 18, 2015 at 5:35 PM, Adam Lewandowski
adam.lewandow...@gmail.com wrote:
Since upgrading to Spark 1.4, I'm getting a
scala.reflect.internal.MissingRequirementError when creating a DataFrame
from an
com.rr.data.visits.VisitSequencerRunner
./mvt-master-SNAPSHOT-jar-with-dependencies.jar
---
Our jar contains both com.rr.data.visits.orc.OrcReadWrite (which you can
see in the stack trace) and the unfound com.rr.data.Visit.
I'll open a Jira ticket
On Thu, Jun 18, 2015 at 3:26 PM Michael Armbrust
I would also love to see a more recent version of Spark SQL. There have
been a lot of performance improvements between 1.2 and 1.4 :)
On Thu, Jun 18, 2015 at 3:18 PM, Steve Nunez snu...@hortonworks.com wrote:
Interesting. What where the Hive settings? Specifically it would be
useful to know
How are you adding com.rr.data.Visit to spark? With --jars? It is
possible we are using the wrong classloader. Could you open a JIRA?
On Thu, Jun 18, 2015 at 2:56 PM, Chad Urso McDaniel cha...@gmail.com
wrote:
We are seeing class exceptions when converting to a DataFrame.
Anyone out there
this would be a great addition to spark, and ideally it belongs in spark
core not sql.
I agree with the fact that this would be a great addition, but we would
likely want a specialized SQL implementation for performance reasons.
I would suggest looking at
https://github.com/datastax/spark-cassandra-connector
On Tue, Jun 16, 2015 at 4:01 AM, Hafiz Mujadid hafizmujadi...@gmail.com
wrote:
hi all!
is there a way to connect cassandra with jdbcRDD ?
--
View this message in context:
Sounds like SPARK-5456 https://issues.apache.org/jira/browse/SPARK-5456.
Which is fixed in Spark 1.4.
On Sun, Jun 14, 2015 at 11:57 AM, Sathish Kumaran Vairavelu
vsathishkuma...@gmail.com wrote:
Hello Everyone,
I pulled 2 different tables from the JDBC source and then joined them
using the
Can you please file a JIRA?
On Sun, Jun 14, 2015 at 2:20 PM, Peter Haumer phau...@us.ibm.com wrote:
Hello.
I have an ETL app that appends to a JDBC table new results found at each
run. In 1.3.1 I did this:
testResultsDF.insertIntoJDBC(CONNECTION_URL + ;user= + USER +
;password= +
Yes, its all just RDDs under the covers. DataFrames/SQL is just a more
concise way to express your parallel programs.
On Sat, Jun 13, 2015 at 5:25 PM, Rex X dnsr...@gmail.com wrote:
Thanks, Don! Does SQL implementation of spark do parallel processing on
records by default?
-Rex
On Sat,
2. Does 1.3.2 or 1.4 have any enhancements that can help? I tried to use
1.3.1 but SPARK-6967 prohibits me from doing so.Now that 1.4 is
available, would any of the JOIN enhancements help this situation?
I would try Spark 1.4 after running SET
spark.sql.planner.sortMergeJoin=true.
Yes, DataFrames are for much more than SQL and I would recommend using them
where ever possible. It is much easier for us to do optimizations when we
have more information about the schema of your data, and as such, most of
our on going optimization effort will focus on making DataFrames faster.
This sounds like a problem that was fixed in Spark 1.3.1.
https://issues.apache.org/jira/browse/SPARK-6351
On Mon, Jun 1, 2015 at 5:44 PM, Akhil Das ak...@sigmoidanalytics.com
wrote:
This thread
Each time you run a Spark SQL query we will create new RDDs that load the
data and thus you should see the newest results. There is one caveat:
formats that use the native Data Source API (parquet, ORC (in Spark 1.4),
JSON (in Spark 1.5)) cache file metadata to speed up interactive querying.
To
DataFrames have a lot more information about the data, so there is a whole
class of optimizations that are possible there that we cannot do in RDDs.
This is why we are focusing a lot of effort on this part of the project.
In Spark 1.4 you can accomplish what you want using the new window function
customerDF.groupBy(state).agg(max($discount).alias(newName))
(or .as(...), both functions can take a String or a Symbol)
On Tue, May 19, 2015 at 2:11 PM, Cesar Flores ces...@gmail.com wrote:
I would like to ask if there is a way of specifying the column name of a
data frame aggregation. For
Perhaps you are looking for GROUP BY and collect_set, which would allow you
to stay in SQL. I'll add that in Spark 1.4 you can get access to items of
a row by name.
On Fri, May 15, 2015 at 10:48 AM, Edward Sargisson ejsa...@gmail.com
wrote:
Hi all,
This might be a question to be answered or
There are several ways to solve this ambiguity:
*1. use the DataFrames to get the attribute so its already resolved and
not just a string we need to map to a DataFrame.*
df.join(df2, df(_1) === df2(_1))
*2. Use aliases*
df.as('a).join(df2.as('b), $a._1 === $b._1)
*3. rename the columns as you
The list of unsupported hive features should mention that it implicitly
includes features added after Hive 13. You cannot yet compile with Hive
13, though we are investigating this for 1.5
On Thu, May 14, 2015 at 6:40 AM, Denny Lee denny.g@gmail.com wrote:
Delete from table is available
You can configure Spark SQLs hive interaction by placing a hive-site.xml
file in the conf/ directory.
On Thu, May 14, 2015 at 10:24 AM, jamborta jambo...@gmail.com wrote:
Hi all,
is it possible to set hive.metastore.warehouse.dir, that is internally
create by spark, to be stored externally
date for Spark version 1.4?
Regards,
Ishwardeep
*From:* Michael Armbrust [mailto:mich...@databricks.com]
*Sent:* Wednesday, May 13, 2015 10:54 PM
*To:* ayan guha
*Cc:* Ishwardeep Singh; user
*Subject:* Re: [Spark SQL 1.3.1] data frame saveAsTable returns exception
I think
that the column reference is valid? Thx.
Dean
On Wednesday, May 13, 2015, Michael Armbrust mich...@databricks.com
wrote:
I would not say that either method is preferred (neither is
old/deprecated). One advantage to the second is that you are referencing a
column from a specific dataframe
Sorry for missing that in the upgrade guide. As part of unifying the Java
and Scala interfaces we got rid of the java specific row. You are correct
in assuming that you want to use row in org.apache.spark.sql from both
Scala and Java now.
On Wed, May 13, 2015 at 2:48 AM, Emerson Castañeda
val trainRDD = rawTrainData.map( rawRow = Row( rawRow.split(,)
.map(_.toInt) ) )
The above is creating a Row with a single column that contains a sequence.
You need to extract the sequence using varargs:
val trainRDD = rawTrainData.map( rawRow = Row( rawRow.split(,)
.map(_.toInt): _* ))
You
Since there is an array here you are probably looking for HiveQL's LATERAL
VIEW explode
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView
.
On Mon, May 11, 2015 at 7:12 AM, ayan guha guha.a...@gmail.com wrote:
Typically you would use . notation to access, same way you
BTW, I use spark 1.3.1, and already set
spark.sql.parquet.useDataSourceApi to false.
Schema merging is only supported when this flag is set to true (setting it
to false uses old code that will be removed once the new code is proven).
Temporary tables are not displayed by SHOW TABLES until Spark 1.3.
On Mon, May 11, 2015 at 12:54 PM, Judy Nash judyn...@exchange.microsoft.com
wrote:
Hi,
How can I get a list of temporary tables via Thrift?
Have used thrift’s startWithContext and registered a temp table, but not
:* Michael Armbrust [mailto:mich...@databricks.com]
*Sent:* Saturday, May 09, 2015 11:32 AM
*To:* Oleg Shirokikh
*Cc:* user
*Subject:* Re: Spark SQL: STDDEV working in Spark Shell but not in a
standalone app
Are you perhaps using a HiveContext in the shell but a SQLContext in your
app? I don't
it failed to merge
incompatible schemas. I think here it means that, the int schema cannot
be merged with the long one.
Does it mean that the schema merging doesn't support the same field with
different types?
-Wei
On Mon, May 11, 2015 at 3:10 PM, Michael Armbrust mich...@databricks.com
That code path is entirely delegated to hive. Does hive support this? You
might try instead using sparkContext.addJar.
On Sat, May 9, 2015 at 12:32 PM, Ravindra ravindra.baj...@gmail.com wrote:
Hi All,
I am trying to create custom udfs with hiveContext as given below -
scala
Thats a feature flag for a new code path for reading parquet files. Its
only there in case bugs are found in the old path and will be removed once
we are sure the new path is solid.
On Fri, May 8, 2015 at 8:04 AM, Peter Rudenko petro.rude...@gmail.com
wrote:
Hm, thanks.
Do you know what this
What version of Spark are you using? It appears that at least in master we
are doing the conversion correctly, but its possible older versions of
applySchema do not. If you can reproduce the same bug in master, can you
open a JIRA?
On Fri, May 8, 2015 at 1:36 AM, Haopu Wang hw...@qilinsoft.com
What are you trying to accomplish? Internally Spark SQL will add Exchange
operators to make sure that data is partitioned correctly for joins and
aggregations. If you are going to do other RDD operations on the result of
dataframe operations and you need to manually control the partitioning,
to hear you guys already working to fix this on future
releases.
Thanks,
Carlos
On Fri, May 8, 2015 at 2:43 PM, Michael Armbrust mich...@databricks.com
wrote:
This is an unfortunate limitation of the datasource api which does not
support multiple databases. For parquet in particular (if you
Sorry for the confusion. SQLContext doesn't have a persistent metastore so
its not possible to save data as a table. If anyone wants to contribute,
I'd welcome a new query planner strategy for SQLContext that gave a better
error message.
On Thu, May 7, 2015 at 8:41 AM, Judy Nash
Spark SQL using the Data Source API can also do this with much less code
https://twitter.com/michaelarmbrust/status/579346328636891136.
https://github.com/databricks/spark-avro
On Thu, May 7, 2015 at 8:41 AM, Jonathan Coveney jcove...@gmail.com wrote:
A helpful example of how to convert:
I would suggest also looking at: https://github.com/databricks/spark-avro
On Wed, May 6, 2015 at 10:48 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote:
Hello,
This is how i read Avro data.
import org.apache.avro.generic.GenericData
import org.apache.avro.generic.GenericRecord
import
I don't think that works:
https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration
On Tue, May 5, 2015 at 6:25 PM, nitinkak001 nitinkak...@gmail.com wrote:
I am running hive queries from HiveContext, for which we need a
hive-site.xml.
Is it possible to replace it with
Hi Iulian,
The relevant code is in ScalaReflection
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala,
and it would be awesome if you could suggest how to fix this more
generally. Specifically, this code is also broken when
You need to add a select clause to at least one dataframe to give them the
same schema before you can union them (much like in SQL).
On Tue, May 5, 2015 at 3:24 AM, Wilhelm niznik.pa...@gmail.com wrote:
Hey there,
1.) I'm loading 2 avro files with that have slightly different schema
df1 =
Option only works when you are going from case classes. Just put null into
the Row, when you want the value to be null.
On Tue, May 5, 2015 at 9:00 AM, Masf masfwo...@gmail.com wrote:
Hi.
I have a spark application where I store the results into table (with
HiveContext). Some of these
This should work from java too:
http://spark.apache.org/docs/1.3.1/api/java/index.html#org.apache.spark.sql.functions$
On Tue, May 5, 2015 at 4:15 AM, Jan-Paul Bultmann janpaulbultm...@me.com
wrote:
Hey,
What is the recommended way to create literal columns in java?
Scala has the `lit`
We support both LATERAL VIEWs (a query language feature that lets you turn
a single row into many rows, for example with an explode) and virtual views
(a table that is really just a query that is run on demand).
On Mon, May 4, 2015 at 7:12 PM, luohui20...@sina.com wrote:
guys,
just to
If you do a join with at least one equality relationship between the two
tables, Spark SQL will automatically hash partition the data and perform
the join.
If you are looking to prepartition the data, that information is not yet
propagated from the in memory cached representation so won't help
The JDBC interface for Spark SQL does not support pushing down limits today.
On Mon, May 4, 2015 at 8:06 AM, Robin East robin.e...@xense.co.uk wrote:
and a further question - have you tried running this query in pqsl? what’s
the performance like there?
On 4 May 2015, at 16:04, Robin East
If you data is evenly distributed (i.e. no skewed datapoints in your join
keys), it can also help to increase spark.sql.shuffle.partitions (default
is 200).
On Mon, May 4, 2015 at 8:03 AM, Richard Marscher rmarsc...@localytics.com
wrote:
In regards to the large GC pauses, assuming you allocated
You are looking for LATERAL VIEW explode
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-explode
in HiveQL.
On Mon, May 4, 2015 at 7:49 AM, Giovanni Paolo Gibilisco gibb...@gmail.com
wrote:
Hi, I'm trying to parse log files generated by Spark using
Unfortunately, I think the SQLParser is not threadsafe. I would recommend
using HiveQL.
On Thu, Apr 30, 2015 at 4:07 AM, Wangfei (X) wangf...@huawei.com wrote:
actually this is a sql parse exception, are you sure your sql is right?
发自我的 iPhone
在 2015年4月30日,18:50,Haopu Wang
This looks like a bug. Mind opening a JIRA?
On Thu, Apr 30, 2015 at 3:49 PM, Justin Yip yipjus...@prediction.io wrote:
After some trial and error, using DataType solves the problem:
df.withColumn(millis, $eventTime.cast(
org.apache.spark.sql.types.LongType) * 1000)
Justin
On Thu, Apr
No, sorry this is not supported. Support for more than one database is
lacking in several areas (though mostly works for hive tables). I'd like
to fix this in Spark 1.5.
On Tue, Apr 28, 2015 at 1:54 AM, James Aley james.a...@swiftkey.com wrote:
Hey all,
I'm trying to create tables from
Sorry for the confusion. We should be more clear about the semantics in
the documentation. (PRs welcome :) )
.saveAsTable does not create a hive table, but instead creates a Spark Data
Source table. Here the metadata is persisted into Hive, but hive cannot
read the tables (as this API support
using Avro please?
Many thanks again!
Renato M.
2015-04-21 20:45 GMT+02:00 Michael Armbrust mich...@databricks.com:
Here is an example using rows directly:
https://spark.apache.org/docs/1.3.0/sql-programming-guide.html#programmatically-specifying-the-schema
Avro or parquet input would
for something that
provides a better performance than what we are seeing now. Would you
recommend using Avro presentation then?
Thanks again!
Renato M.
2015-04-21 1:18 GMT+02:00 Michael Armbrust mich...@databricks.com:
There is a cost to converting from JavaBeans to Rows and this code path
This is https://issues.apache.org/jira/browse/SPARK-6231
Unfortunately this is pretty hard to fix as its hard for us to
differentiate these without aliases. However you can add an alias as
follows:
from pyspark.sql.functions import *
df.alias(a).join(df.alias(b), col(a.col1) == col(b.col1))
On
:37:39 INFO util.SchemaRDDUtils$: BLOCK
BTW
It worked on 1.2.1...
On Thu, Apr 2, 2015 at 11:47 AM, Hao Ren inv...@gmail.com wrote:
Hi,
Jira created: https://issues.apache.org/jira/browse/SPARK-6675
Thank you.
On Wed, Apr 1, 2015 at 7:50 PM, Michael Armbrust mich...@databricks.com
There is a cost to converting from JavaBeans to Rows and this code path has
not been optimized. That is likely what you are seeing.
On Mon, Apr 20, 2015 at 3:55 PM, ayan guha guha.a...@gmail.com wrote:
SparkSQL optimizes better by column pruning and predicate pushdown,
primarily. Here you are
You are probably using an encoding that we don't support. I think this PR
may be adding that support: https://github.com/apache/spark/pull/5422
On Sat, Apr 18, 2015 at 5:40 PM, Abhishek R. Singh
abhis...@tetrationanalytics.com wrote:
I have created a bunch of protobuf based parquet files that
Filed: https://issues.apache.org/jira/browse/SPARK-6967
Shouldn't they be null?
Statistics are only used to eliminate partitions that can't possibly hold
matching values. So while you are right this might result in a false
positive, that will not result in a wrong answer.
the performance of each of the
above options is
-Original Message-
From: Christian Perez [mailto:christ...@svds.com]
Sent: Thursday, April 16, 2015 6:09 PM
To: Michael Armbrust
Cc: user
Subject: Re: Super slow caching in 1.3?
Hi Michael,
Good question! We checked 1.2 and found
Schema merging is not the feature you are looking for. It is designed when
you are adding new records (that are not associated with old records),
which may or may not have new or missing columns.
In your case it looks like you have two datasets that you want to load
separately and join on a key.
Spark SQL (which also can give you an RDD for use with the standard Spark
RDD API) has support for json, parquet, and hive tables
http://spark.apache.org/docs/latest/sql-programming-guide.html#data-sources.
There is also a library for Avro https://github.com/databricks/spark-avro.
On Tue, Apr 14,
Can you open a JIRA?
On Tue, Apr 14, 2015 at 1:56 AM, Clint McNeil cl...@impactradius.com
wrote:
Hi guys
I have parquet data written by Impala:
Server version: impalad version 2.1.2-cdh5 RELEASE (build
36aad29cee85794ecc5225093c30b1e06ffb68d3)
When using Spark SQL 1.3.0
There is an example here:
http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases
On Mon, Apr 13, 2015 at 6:07 PM, doovs...@sina.com wrote:
Hi all,
Who know how to access postgresql on Spark SQL? Do I need add the
postgresql dependency in build.sbt and set
RDDs are immutable. Running .repartition does not change the RDD, but
instead returns *a new RDD *with more partitions.
On Tue, Apr 14, 2015 at 3:59 AM, Masf masfwo...@gmail.com wrote:
Hi.
It doesn't work.
val file = SqlContext.parquetfile(hdfs://node1/user/hive/warehouse/
file.parquet)
More info on why toDF is required:
http://spark.apache.org/docs/latest/sql-programming-guide.html#upgrading-from-spark-sql-10-12-to-13
On Tue, Apr 14, 2015 at 6:55 AM, pishen tsai pishe...@gmail.com wrote:
I've changed it to
import sqlContext.implicits._
but it still doesn't work. (I've
The problem is likely that the underlying avro library is reusing objects
for speed. You probably need to explicitly copy the values out of the
reused record before the collect.
On Sat, Apr 11, 2015 at 9:23 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote:
The read seem to be successfully as the
Here is the stack trace. The first part shows the log when the session is
started in Tableau. It is using the init sql option on the data
connection to create theTEMPORARY table myNodeTable.
Ah, I see. thanks for providing the error. The problem here is that
temporary tables do not exist in
That is a good question. Names with `.` in them are in particular broken
by SPARK-5632 https://issues.apache.org/jira/browse/SPARK-5632, which I'd
like to fix.
There is a more general question of whether strings that are passed to
DataFrames should be treated as quoted identifiers (i.e. `as
Thanks for the report. We improved the speed here in 1.3.1 so would be
interesting to know if this helps. You should also try disabling schema
merging if you do not need that feature (i.e. all of your files are the
same schema).
sqlContext.load(path, parquet, Map(mergeSchema - false))
On Wed,
(ParallelGC) prio=10 tid=0x7f149402c000 nid=0xe74
runnable
VM Periodic Task Thread prio=10 tid=0x7f14940c2800 nid=0xe7c waiting
on condition
JNI global references: 230
Tell me if anything else is needed.
Thank you.
Hao.
On Tue, Apr 7, 2015 at 8:00 PM, Michael Armbrust mich
:* Michael Armbrust; user
*Subject:* Re: Advice using Spark SQL and Thrift JDBC Server
To use the HiveThriftServer2.startWithContext, I thought one would use the
following artifact in the build:
org.apache.spark%% spark-hive-thriftserver % 1.3.0
But I am unable to resolve
Back to the user list so everyone can see the result of the discussion...
Ah. It all makes sense now. The issue is that when I created the parquet
files, I included an unnecessary directory name (data.parquet) below the
partition directories. It’s just a leftover from when I started with
The joins here are totally different implementations, but it is worrisome
that you are seeing the SQL join hanging. Can you provide more information
about the hang? jstack of the driver and a worker that is processing a
task would be very useful.
On Tue, Apr 7, 2015 at 8:33 AM, Hao Ren
1) What exactly is the relationship between the thrift server and Hive?
I'm guessing Spark is just making use of the Hive metastore to access table
definitions, and maybe some other things, is that the case?
Underneath the covers, the Spark SQL thrift server is executing queries
using a
Have you looked at spark-avro?
https://github.com/databricks/spark-avro
On Tue, Apr 7, 2015 at 3:57 AM, Yamini yamini.m...@gmail.com wrote:
Using spark(1.2) streaming to read avro schema based topics flowing in
kafka
and then using spark sql context to register data as temp table. Avro maven
presumably could also be avoided fairly trivially by
periodically restarting the server with a new context internally. That
certainly beats manual curation of Hive table definitions, if it will work?
Thanks again,
James.
On 7 April 2015 at 19:30, Michael Armbrust mich...@databricks.com
In HiveQL, you should be able to express this as:
SELECT ... FROM table GROUP BY m['SomeKey']
On Sat, Apr 4, 2015 at 5:25 PM, Justin Yip yipjus...@prediction.io wrote:
Hello,
I have a case class like this:
case class A(
m: Map[Long, Long],
...
)
and constructed a DataFrame from
I'll add that I don't think there is a convenient way to do this in the
Column API ATM, but would welcome a JIRA for adding it :)
On Mon, Apr 6, 2015 at 1:45 PM, Michael Armbrust mich...@databricks.com
wrote:
In HiveQL, you should be able to express this as:
SELECT ... FROM table GROUP BY m
Hey Todd,
In migrating to 1.3.x I see that the spark.sql.hive.convertMetastoreParquet
is no longer public, so the above no longer works.
This was probably just a typo, but to be clear,
spark.sql.hive.convertMetastoreParquet is still a supported option and
should work. You are correct that
You could certainly build a connector, but it seems like you would want
support for pushing down aggregations to get the benefits of Druid. There
are only experimental interfaces for doing so today, but it sounds like a
pretty cool project.
On Mon, Apr 6, 2015 at 2:23 PM, Paolo Platter
the subsequent queries are different.
On Mon, Apr 6, 2015 at 2:41 PM, Michael Armbrust mich...@databricks.com
wrote:
It is generated and cached on each of the executors.
On Mon, Apr 6, 2015 at 2:32 PM, Akshat Aranya aara...@gmail.com wrote:
Hi,
I'm curious as to how Spark does code
It is generated and cached on each of the executors.
On Mon, Apr 6, 2015 at 2:32 PM, Akshat Aranya aara...@gmail.com wrote:
Hi,
I'm curious as to how Spark does code generation for SQL queries.
Following through the code, I saw that an expression is parsed and
compiled into a class using
Do you think you are seeing a regression from 1.2? Also, are you caching
nested data or flat rows? The in-memory caching is not really designed for
nested data and so performs pretty slowly here (its just falling back to
kryo and even then there are some locking issues).
If so, would it be
Do you have a full stack trace?
On Thu, Apr 2, 2015 at 11:45 AM, ogoh oke...@gmail.com wrote:
Hello,
My ETL uses sparksql to generate parquet files which are served through
Thriftserver using hive ql.
It especially defines a schema programmatically since the schema can be
only
known at
(DefaultExecutorFactory.java:64)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
On Thu, Apr 2, 2015 at 2:51 PM, Michael Armbrust
I'll add we just back ported this so it'll be included in 1.2.2 also.
On Wed, Apr 1, 2015 at 4:14 PM, Michael Armbrust mich...@databricks.com
wrote:
This is fixed in Spark 1.3.
https://issues.apache.org/jira/browse/SPARK-5195
On Wed, Apr 1, 2015 at 4:05 PM, Judy Nash judyn
Looks like a typo, try:
*df.select**(**df**(name), **df**(age) + 1)*
Or
df.select(name, age)
PRs to fix docs are always appreciated :)
On Apr 2, 2015 7:44 PM, java8964 java8...@hotmail.com wrote:
The import command already run.
Forgot the mention, the rest of examples related to df all
This is actually a problem with our use of Scala's reflection library.
Unfortunately you need to load Spark SQL using the primordial classloader,
otherwise you run into this problem. If anyone from the scala side can
hint how we can tell scala.reflect which classloader to use when creating
the
This is fixed in Spark 1.3.
https://issues.apache.org/jira/browse/SPARK-5195
On Wed, Apr 1, 2015 at 4:05 PM, Judy Nash judyn...@exchange.microsoft.com
wrote:
Hi all,
Noticed a bug in my current version of Spark 1.2.1.
After a table is cached with “cache table table” command, query will
Can you open a JIRA please?
On Wed, Apr 1, 2015 at 9:38 AM, Hao Ren inv...@gmail.com wrote:
Hi,
I find HiveContext.setConf does not work correctly. Here are some code
snippets showing the problem:
snippet 1:
What do you mean by permanently. If you start up the JDBC server and say
CACHE TABLE it will stay cached as long as the server is running. CACHE
TABLE is idempotent, so you could even just have that command in your BI
tools setup queries.
On Wed, Apr 1, 2015 at 11:02 AM, Venkat, Ankam
Can you try with Spark 1.3? Much of this code path has been rewritten /
improved in this version.
On Wed, Apr 1, 2015 at 7:53 AM, Masf masfwo...@gmail.com wrote:
Hi.
In Spark SQL 1.2.0, with HiveContext, I'm executing the following
statement:
CREATE TABLE testTable STORED AS PARQUET AS
Can you open a JIRA for this please?
On Wed, Apr 1, 2015 at 6:14 AM, Ted Yu yuzhih...@gmail.com wrote:
+1 on escaping column names.
On Apr 1, 2015, at 5:50 AM, fergjo00 johngfergu...@gmail.com wrote:
Question:
---
Is there a way to have JDBC DataFrames use quoted/escaped
.
Is there any workaround to achieve the same with 1.2.1?
Thanks,
Jitesh
On Wed, Apr 1, 2015 at 12:25 AM, Michael Armbrust mich...@databricks.com
wrote:
In Spark 1.3 I would expect this to happen automatically when the parquet
table is small ( 10mb, configurable
When few waves (1 or 2) are used in a job, LoadApp could finish after a
few failures and retries.
But when more waves (3) are involved in a job, the job would terminate
abnormally.
Can you clarify what you mean by waves? Are you inserting from multiple
programs concurrently?
You can do something like:
df.collect().map {
case Row(name: String, age1: Int, age2: Int) = ...
}
On Tue, Mar 31, 2015 at 4:05 PM, roni roni.epi...@gmail.com wrote:
I have 2 paraquet files with format e.g name , age, town
I read them and then join them to get all the names which are in
In Spark 1.3 I would expect this to happen automatically when the parquet
table is small ( 10mb, configurable with
spark.sql.autoBroadcastJoinThreshold).
If you are running 1.3 and not seeing this, can you show the code you are
using to create the table?
On Tue, Mar 31, 2015 at 3:25 AM, jitesh129
I'm hoping to cut an RC this week. We are just waiting for a few other
critical fixes.
On Mon, Mar 30, 2015 at 12:54 PM, Kelly, Jonathan jonat...@amazon.com
wrote:
Are you referring to SPARK-6330
https://issues.apache.org/jira/browse/SPARK-6330?
If you are able to build Spark from source
You'll need to use the longer form for aggregation:
tb2.groupBy(city, state).agg(avg(price).as(newName)).show
depending on the language you'll need to import:
scala: import org.apache.spark.sql.functions._
python: from pyspark.sql.functions import *
On Mon, Mar 30, 2015 at 5:49 PM, Neal Yin
In this case I'd probably just store it as a String. Our casting rules
(which come from Hive) are such that when you use a string as an number of
boolean it will be casted to the desired type.
Thanks for the PR btw :)
On Fri, Mar 27, 2015 at 2:31 PM, Eran Medan ehrann.meh...@gmail.com wrote:
501 - 600 of 1052 matches
Mail list logo