Sometimes the underlying Hive code will also print exceptions during
successful execution (for example CREATE TABLE IF NOT EXISTS). If there is
actually a problem Spark SQL should throw an exception.
What is the command you are running and what is the error you are seeing?
On Sat, Sep 6, 2014
It depends on the RDD in question exactly where the work will be done. I
believe that if you do a repartition(1) instead of a coalesce it will force
a shuffle so the work will be done distributed and then a single node will
read that shuffled data and write it out.
If you want to write to a
Are you using SQLContext or HiveContext? The default sql dialect in
HiveContext (HiveQL) is a little more complete and might be a better place
to start.
On Wed, Sep 3, 2014 at 2:12 AM, Samay smilingsa...@gmail.com wrote:
Hi,
I am trying to run query 3 from the TPC-H benchmark using
Check out LATERAL VIEW explode:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView
On Tue, Sep 2, 2014 at 1:26 PM, gtinside gtins...@gmail.com wrote:
Hi ,
I am using jsonRDD in spark sql and having trouble iterating through array
inside the json object. Please refer
Yes you can. HiveContext's functionality is a strict superset of
SQLContext.
On Tue, Sep 2, 2014 at 6:35 PM, gtinside gtins...@gmail.com wrote:
Thanks . I am not using hive context . I am loading data from Cassandra and
then converting it into json and then querying it through SQL context .
I don't believe that Shark works with Spark 1.0. Have you considered
trying Spark SQL?
On Mon, Sep 1, 2014 at 8:21 AM, arthur.hk.c...@gmail.com
arthur.hk.c...@gmail.com wrote:
Hi,
I have installed Spark 1.0.2 and Shark 0.9.2 on Hadoop 2.4.1 (by compiling
from source).
spark: 1.0.2
:24 AM, Michael Armbrust mich...@databricks.com
wrote:
You don't need the Seq, as in is a variadic function.
personTable.where('name in (foo, bar))
On Thu, Aug 28, 2014 at 3:09 AM, Jaonary Rabarisoa jaon...@gmail.com
wrote:
Hi all,
What is the expression that I should use with spark sql
What version are you using?
On Fri, Aug 29, 2014 at 2:22 AM, Jaonary Rabarisoa jaon...@gmail.com
wrote:
Still not working for me. I got a compilation error : *value in is not a
member of Symbol.* Any ideas ?
On Fri, Aug 29, 2014 at 9:46 AM, Michael Armbrust mich...@databricks.com
wrote
Spark SQL is based on Hive 12. They must have changed the maximum key size
between 12 and 13.
On Fri, Aug 29, 2014 at 4:38 AM, arthur.hk.c...@gmail.com
arthur.hk.c...@gmail.com wrote:
Hi,
Tried the same thing in HIVE directly without issue:
HIVE:
hive create table test_datatype2
This feature was not part of that version. It will be in 1.1.
On Fri, Aug 29, 2014 at 12:33 PM, Jaonary Rabarisoa jaon...@gmail.com
wrote:
1.0.2
On Friday, August 29, 2014, Michael Armbrust mich...@databricks.com
wrote:
What version are you using?
On Fri, Aug 29, 2014 at 2:22 AM
The comma is just the way the default toString works for Row objects.
Since SchemaRDDs are also RDDs, you can do arbitrary transformations on
the Row objects that are returned.
For example, if you'd rather the delimiter was '|':
sql(SELECT * FROM src).map(_.mkString(|)).collect()
On Thu, Aug
I'll note the parquet jars are included by default in 1.1
On Wed, Aug 27, 2014 at 11:53 AM, lyc yanchen@huawei.com wrote:
Thanks a lot. Finally, I can create parquet table using your command
-driver-class-path.
I am using hadoop 2.3. Now, I will try to load data into the tables.
I would expect that to work. What exactly is the error?
On Wed, Aug 27, 2014 at 6:02 AM, Matt Chu m...@kabam.com wrote:
(apologies for sending this twice, first via nabble; didn't realize it
wouldn't get forwarded)
Hey, I know it's not officially released yet, but I'm trying to understand
You need to have the datanuclus jars on your classpath. It is not okay to
merge them into an uber jar.
On Wed, Aug 27, 2014 at 1:44 AM, centerqi hu cente...@gmail.com wrote:
Hi all
When I run a simple SQL, encountered the following error.
hive:0.12(metastore in mysql)
hadoop 2.4.1
Arrays in the JVM are also mutable. However, you should not be relying on
the exact type here. The only promise is that you will get back something
of type Seq[_].
On Wed, Aug 27, 2014 at 4:27 PM, Du Li l...@yahoo-inc.com wrote:
Hi, Michael.
I used HiveContext to create a table with a
?
From: Michael Armbrust mich...@databricks.com
Date: Wednesday, August 27, 2014 at 5:21 PM
To: Du Li l...@yahoo-inc.com
Cc: user@spark.apache.org user@spark.apache.org
Subject: Re: SparkSQL returns ArrayBuffer for fields of type Array
Arrays in the JVM are also mutable. However, you should
wrote:
ok i'll try. happen to do that a lot to other tools.
So I am guessing you are saying if i wanted to do it now, i'd start
against https://github.com/apache/spark/tree/branch-1.1 and PR against it?
On Thu, Aug 21, 2014 at 12:28 AM, Michael Armbrust mich...@databricks.com
wrote:
I do
Just like with normal Spark Jobs, that command returns an RDD that contains
the lineage for computing the answer but does not actually compute the
answer. You'll need to run collect() on the RDD in order to get the result.
On Mon, Aug 25, 2014 at 11:46 AM, S Malligarjunan
Which version of Spark SQL are you using? Several issues with custom hive
UDFs have been fixed in 1.1.
On Mon, Aug 25, 2014 at 9:57 AM, S Malligarjunan
smalligarju...@yahoo.com.invalid wrote:
Hello All,
I have added a jar from S3 instance into classpath, i have tried following
options
1.
In our case, the ROW has about 80 columns which exceeds the case class
limit.
Starting with Spark 1.1 you'll be able to also use the applySchema API
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala#L126
.
One useful thing to do when you run into unexpected slowness is to run
'jstack' a few times on the driver and executors and see if there is any
particular hotspot in the Spark SQL code.
Also, it seems like a better option here might be to use the new
applySchema API
Thanks for this very thorough write-up and for continuing to update it as
you progress! As I said in the other thread it would be great to do a
little profiling to see if we can get to the heart of the slowness with
nested case classes (very little optimization has been done in this code
path).
SO I tried the above (why doesn't union or ++ have the same behavior
btw?)
I don't think there is a good reason for this. I'd open a JIRA.
and it works, but is slow because the original Rdds are not
cached and files must be read from disk.
I also discovered you can recover the
is approved for contribution,
obviously PR process will be followed.
On Mon, Aug 25, 2014 at 11:57 AM, Michael Armbrust mich...@databricks.com
wrote:
In general all PRs should be made against master. When necessary, we can
back port them to the 1.1 branch as well. However, since we
- dev list
+ user list
You should be able to query Spark SQL using JDBC, starting with the 1.1
release. There is some documentation is the repo
https://github.com/apache/spark/blob/master/docs/sql-programming-guide.md#running-the-thrift-jdbc-server,
and we'll update the official docs once the
I believe this should work if you run srdd1.unionAll(srdd2). Both RDDs
must have the same schema.
On Wed, Aug 20, 2014 at 11:30 PM, Evan Chan velvia.git...@gmail.com wrote:
Is it possible to merge two cached Spark SQL tables into a single
table so it can queried with one SQL statement?
ie,
I do not know of any existing way to do this. It should be possible using
the new public API for applying schema (will be available in 1.1) to an
RDD. Basically you'll need to convert the proto buff records into rows,
and also create a StructType that represents the schema. With this two
things
will be invoked from a middle tier webapp. I am thinking to use the
Hive JDBC driver.
Thanks,
Ken
*From:* Michael Armbrust [mailto:mich...@databricks.com]
*Sent:* Wednesday, August 20, 2014 9:38 AM
*To:* Tam, Ken K
*Cc:* user@spark.apache.org
*Subject:* Re: Is Spark SQL Thrift Server
Hi to all, sorry for not being fully on topic but I have 2 quick questions
about Parquet tables registered in Hive/sparq:
Using HiveQL to CREATE TABLE will add a table to the metastore / warehouse
exactly as it would in hive. Registering is a purely temporary operation
that lives with the
This is not supported at the moment. There are no concrete plans at the
moment to support it though the programatic API, but it should work using
SQL as you suggested.
On Wed, Aug 13, 2014 at 8:22 AM, Silvio Fiorito
silvio.fior...@granturing.com wrote:
Using the SchemaRDD *insertInto*
I would expect this to work with Spark SQL (available in 1.0+) but there is
a JIRA open to confirm this works SPARK-2883
https://issues.apache.org/jira/browse/SPARK-2883.
On Mon, Aug 11, 2014 at 10:23 PM, vinay.kash...@socialinfra.net wrote:
Hi all,
Is it possible to use table with ORC
I do not believe this is true. If you are using a hive context you should
be able to register an RDD as a temporary table and then use INSERT INTO
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-InsertingdataintoHiveTablesfromqueriesto
add data to a hive
Hive pulls in a ton of dependencies that we were afraid would break
existing spark applications. For this reason all hive submodules are
optional.
On Tue, Aug 12, 2014 at 7:43 AM, John Omernik j...@omernik.com wrote:
Yin helped me with that, and I appreciate the onlist followup. A few
I imagine it's not the only instance of this kind of problem people
will ever encounter. Can you rebuild Spark with this particular
release of Hive?
Unfortunately the Hive APIs that we use change to much from release to
release to make this possible. There is a JIRA for compiling Spark SQL
Sounds like you need to use lateral view with explode
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView,
which is supported in Spark SQL's HiveContext.
On Sat, Aug 9, 2014 at 6:43 PM, Sathish Kumaran Vairavelu
vsathishkuma...@gmail.com wrote:
I have a simple JSON
This is maybe not exactly what you are asking for, but you might consider
looking at the queryExecution (a developer API that shows how the query is
analyzed / executed)
sql(...).queryExecution
On Wed, Aug 6, 2014 at 3:55 PM, Tom thubregt...@gmail.com wrote:
Hi,
I am trying to look at for
We are working on an overhaul of the docs before the 1.1 release. In the
mean time try: CACHE TABLE tableName.
On Tue, Aug 5, 2014 at 9:02 AM, John Omernik j...@omernik.com wrote:
I gave things working on my cluster with the sparksql thrift server.
(Thank you Yin Huai at Databricks!)
That
For outer joins I'd recommend upgrading to master or waiting for a 1.1
release candidate (which should be out this week).
On Tue, Aug 5, 2014 at 7:38 AM, Dima Zhiyanov dimazhiya...@hotmail.com
wrote:
I am also experiencing this kryo buffer problem. My join is left outer with
under 40mb on the
Is this on 1.0.1? I'd suggest running this on master or the 1.1-RC which
should be coming out this week. Pyspark did not have good support for
nested data previously. If you still encounter issues using a more recent
version, please file a JIRA. Thanks!
On Tue, Aug 5, 2014 at 11:55 AM, Brad
Maybe; I’m not sure just yet. Basically, I’m looking for something
functionally equivalent to this:
sqlContext.jsonRDD(RDD[dict].map(lambda x: json.dumps(x)))
In other words, given an RDD of JSON-serializable Python dictionaries, I
want to be able to infer a schema that is guaranteed to
when it knows the exact
size of the data) There is a discussion here about trying to improve this:
https://issues.apache.org/jira/browse/SPARK-2650
On Sun, Aug 3, 2014 at 11:35 PM, Gurvinder Singh gurvinder.si...@uninett.no
wrote:
On 08/03/2014 02:33 AM, Michael Armbrust wrote:
I am
Yeah, there will likely be a community preview build soon for the 1.1
release. Benchmarking that will both give you better performance and help
QA the release.
Bonus points if you turn on codegen for Spark SQL (experimental feature)
when benchmarking and report bugs: SET spark.sql.codegen=true
for
caching/storing. So I am wondering how the memory is handled in
cacheTable case. Does it reserve the memory storage and other parts run
out of their memory. I also tries to change the
spark.storage.memoryFraction but that did not help.
- Gurvinder
On 08/01/2014 08:42 AM, Michael Armbrust wrote
We are investigating various ways to integrate with Tachyon. I'll note
that you can already use saveAsParquetFile and
parquetFile(...).registerAsTable(tableName) (soon to be registerTempTable
in Spark 1.1) to store data into tachyon and query it with Spark SQL.
On Fri, Aug 1, 2014 at 1:42 AM,
The number of partitions (which decides the number of tasks) is fixed after
any shuffle and can be configured using 'spark.sql.shuffle.partitions'
though SQLConf (i.e. sqlContext.set(...) or
SET spark.sql.shuffle.partitions=... in sql) It is possible we will auto
select this based on statistics
So is the only issue that impala does not see changes until you refresh the
table? This sounds like a configuration that needs to be changed on the
impala side.
On Fri, Aug 1, 2014 at 7:20 AM, Patrick McGloin mcgloin.patr...@gmail.com
wrote:
Sorry, sent early, wasn't finished typing.
CREATE
The performance should be the same using the DSL or SQL strings.
On Thu, Jul 31, 2014 at 2:36 PM, Buntu Dev buntu...@gmail.com wrote:
I was not sure if registerAsTable() and then query against that table have
additional performance impact and if DSL eliminates that.
On Thu, Jul 31, 2014 at
cacheTable uses a special columnar caching technique that is optimized for
SchemaRDDs. It something similar to MEMORY_ONLY_SER but not quite. You
can specify the persistence level on the SchemaRDD itself and register that
as a temporary table, however it is likely you will not get as good
Very cool. Glad you found a solution that works.
On Wed, Jul 30, 2014 at 1:04 PM, Venkat Subramanian vsubr...@gmail.com
wrote:
For the time being, we decided to take a different route. We created a Rest
API layer in our app and allowed SQL query passing via the Rest. Internally
we pass that
The warehouse and the metastore directories are two different things. The
metastore holds the schema information about the tables and will by default
be a local directory. With javax.jdo.option.ConnectionURL you can
configure it to be something like mysql. The warehouse directory is the
default
/adam/rdd/RegionJoin.scala
I was thinking to provide an improved version of method partitionAndJoin
from the ADAM implementation above
On Sat, Jul 26, 2014 at 12:37 PM, Michael Armbrust mich...@databricks.com
wrote:
A very simple example of adding a new operator to Spark SQL:
https
Take a look at the programming guide for spark sql:
http://spark.apache.org/docs/latest/sql-programming-guide.html
On Wed, Jul 23, 2014 at 11:09 AM, buntu buntu...@gmail.com wrote:
I wanted to experiment with using Parquet data with SparkSQL. I got some
tab-delimited files and wanted to know
When SPARK-2211 is done, will spark sql automatically choose join
algorithms?
Is there some way to manually hint the optimizer?
Ideally we will select the best algorithm for you. We are also considering
ways to allow the user to hint.
Can you provide the code? Is Record a case class? and is it defined as a
top level object? Also have you done import sqlContext._?
On Sat, Jul 19, 2014 at 3:39 AM, junius junius.z...@gmail.com wrote:
Hello,
I write code to practice Spark Sql based on latest Spark version.
But I get
There is no version of shark that works with spark 1.0.
More details about the path forward here:
http://databricks.com/blog/2014/07/01/shark-spark-sql-hive-on-spark-and-the-future-of-sql-on-spark.html
On Jul 18, 2014 4:53 AM, Megane1994 leumenilari...@yahoo.fr wrote:
Hello,
I want to run
Sorry for the non-obvious error message. It is not valid SQL to include
attributes in the select clause unless they are also in the group by clause
or are inside of an aggregate function.
On Jul 18, 2014 5:12 AM, Martin Gammelsæter martingammelsae...@gmail.com
wrote:
Hi again!
I am having
It's likely that since your UDF is a black box to hive's query optimizer
that it must choose a less efficient join algorithm that passes all
possible matches to your function for comparison. This will happen any
time your UDF touches attributes from both sides of the join.
In general you can
Can you tell us more about your environment. Specifically, are you also
running on Mesos?
On Jul 18, 2014 12:39 AM, Victor Sheng victorsheng...@gmail.com wrote:
when I run a query to a hadoop file.
mobile.registerAsTable(mobile)
val count = sqlContext.sql(select count(1) from mobile)
res5:
See the section on advanced dependency management:
http://spark.apache.org/docs/latest/submitting-applications.html
On Jul 17, 2014 10:53 PM, linkpatrickliu linkpatrick...@live.com wrote:
Seems like the mysql connector jar is not included in the classpath.
Where can I set the jar to the
You can do insert into. As with other SQL on HDFS systems there is no
updating of data.
On Jul 17, 2014 1:26 AM, Akhil Das ak...@sigmoidanalytics.com wrote:
Is this what you are looking for?
https://spark.apache.org/docs/1.0.0/api/java/org/apache/spark/sql/parquet/InsertIntoParquetTable.html
Unfortunately, this is a query where we just don't have an efficiently
implementation yet. You might try switching the table order.
Here is the JIRA for doing something more efficient:
https://issues.apache.org/jira/browse/SPARK-2212
On Fri, Jul 18, 2014 at 7:05 AM, Pei-Lun Lee
$CLASSPATH $CONFIG_OPTS test.Test4 spark://master:7077
/usr/local/spark-1.0.1-bin-hadoop1
hdfs://master:54310/user/hduser/file1.csv
hdfs://master:54310/user/hduser/file2.csv*
~Sarath
On Wed, Jul 16, 2014 at 8:14 PM, Michael Armbrust
mich...@databricks.com wrote:
What if you just run
If you intern the string it will be more efficient, but still significantly
more expensive than the class based approach.
** VERY EXPERIMENTAL **
We are working with EPFL on a lightweight syntax for naming the results of
spark transformations in scala (and are going to make it interoperate with
We don't have support for partitioned parquet yet. There is a JIRA here:
https://issues.apache.org/jira/browse/SPARK-2406
On Thu, Jul 17, 2014 at 5:00 PM, Tathagata Das tathagata.das1...@gmail.com
wrote:
val kafkaStream = KafkaUtils.createStream(... ) // see the example in my
previous post
I think what you might be looking for is the ability to programmatically
specify the schema, which is coming in 1.1.
Here's the JIRA: SPARK-2179
https://issues.apache.org/jira/browse/SPARK-2179
On Wed, Jul 16, 2014 at 8:24 AM, pandees waran pande...@gmail.com wrote:
Hi,
I am newbie to spark
Yes, but if both tagCollection and selectedVideos have a column named id
then Spark SQL does not know which one you are referring to in the where
clause. Here's an example with aliases:
val x = testData2.as('x)
val y = testData2.as('y)
val join = x.join(y, Inner, Some(x.a.attr ===
What if you just run something like:
*sc.textFile(hdfs://localhost:54310/user/hduser/file1.csv).count()*
On Wed, Jul 16, 2014 at 10:37 AM, Sarath Chandra
sarathchandra.jos...@algofusiontech.com wrote:
Yes Soumya, I did it.
First I tried with the example available in the documentation
the logical plan, it is executed in spark regardless of
dialect although the execution might be different for the same query.
Best Regards,
Jerry
On Tue, Jul 15, 2014 at 6:22 PM, Michael Armbrust mich...@databricks.com
wrote:
hql and sql are just two different dialects for interacting
Note that runnning a simple map+reduce job on the same hdfs files with the
same installation works fine:
Did you call collect() on the totalLength? Otherwise nothing has actually
executed.
Oh, I'm sorry... reduce is also an operation
On Wed, Jul 16, 2014 at 3:37 PM, Michael Armbrust mich...@databricks.com
wrote:
Note that runnning a simple map+reduce job on the same hdfs files with the
same installation works fine:
Did you call collect() on the totalLength? Otherwise
H, it could be some weirdness with classloaders / Mesos / spark sql?
I'm curious if you would hit an error if there were no lambda functions
involved. Perhaps if you load the data using jsonFile or parquetFile.
Either way, I'd file a JIRA. Thanks!
On Jul 16, 2014 6:48 PM, Svend
You should try cleaning and then building. We have recently hit a bug in
the scala compiler that sometimes causes non-clean builds to fail.
On Wed, Jul 16, 2014 at 7:56 PM, Matei Zaharia matei.zaha...@gmail.com
wrote:
Yeah, we try to have a regular 3 month release cycle; see
You might be hitting SPARK-1994
https://issues.apache.org/jira/browse/SPARK-1994, which is fixed in 1.0.1.
On Mon, Jul 14, 2014 at 11:16 PM, Nick Chammas nicholas.cham...@gmail.com
wrote:
I’m running this query against RDD[Tweet], where Tweet is a simple case
class with 4 fields.
In general this should be supported using [] to access array data and .
to access nested fields. Is there something you are trying that isn't
working?
On Mon, Jul 14, 2014 at 11:25 PM, anyweil wei...@gmail.com wrote:
I mean the query on the nested data such as JSON, not the nested query,
Sorry for the trouble. There are two issues here:
- Parsing of repeated nested (i.e. something[0].field) is not supported in
the plain SQL parser. SPARK-2096
https://issues.apache.org/jira/browse/SPARK-2096
- Resolution is broken in the HiveQL parser. SPARK-2483
https://issues.apache.org/jira/browse/SPARK-2446?
2014-07-15 3:54 GMT+08:00 Michael Armbrust mich...@databricks.com:
This is not supported yet, but there is a PR open to fix it:
https://issues.apache.org/jira/browse/SPARK-2446
On Mon, Jul 14, 2014 at 4:17 AM, Pei-Lun Lee pl...@appier.com wrote
Make the Array a Seq.
On Tue, Jul 15, 2014 at 7:12 AM, Jaonary Rabarisoa jaon...@gmail.com
wrote:
Hi all,
How should I store a one to many relationship using spark sql and parquet
format. For example I the following case class
case class Person(key: String, name: String, friends:
Are you registering multiple RDDs of case classes as tables concurrently?
You are possibly hitting SPARK-2178
https://issues.apache.org/jira/browse/SPARK-2178 which is caused by
SI-6240 https://issues.scala-lang.org/browse/SI-6240.
On Tue, Jul 15, 2014 at 10:49 AM, Keith Simmons
, Jul 15, 2014 at 11:14 AM, Michael Armbrust
mich...@databricks.com
wrote:
Are you registering multiple RDDs of case classes as tables
concurrently?
You are possibly hitting SPARK-2178 which is caused by SI-6240.
On Tue, Jul 15, 2014 at 10:49 AM, Keith Simmons
keith.simm...@gmail.com
powerful SQL support
borrowed from Hive. Can you shed some lights on this when you get a minute?
Thanks,
Jerry
On Tue, Jul 15, 2014 at 4:32 PM, Michael Armbrust mich...@databricks.com
wrote:
No, that is why I included the link to SPARK-2096
https://issues.apache.org/jira/browse/SPARK
You can find the parser here:
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SqlParser.scala
In general the hive parser provided by HQL is much more complete at the
moment. Long term we will likely stop using parser combinators and either
This is not supported yet, but there is a PR open to fix it:
https://issues.apache.org/jira/browse/SPARK-2446
On Mon, Jul 14, 2014 at 4:17 AM, Pei-Lun Lee pl...@appier.com wrote:
Hi,
I am using spark-sql 1.0.1 to load parquet files generated from method
described in:
Yeah, sadly this dependency was introduced when someone consolidated the
logging infrastructure. However, the dependency should be very small and
thus easy to remove, and I would like catalyst to be usable outside of
Spark. A pull request to make this possible would be welcome.
Ideally, we'd
What sort of nested query are you talking about? Right now we only support
nested queries in the FROM clause. I'd like to add support for other cases
in the future.
On Sun, Jul 13, 2014 at 4:11 AM, anyweil wei...@gmail.com wrote:
Or is it supported? I know I could doing it myself with
Handling of complex types is somewhat limited in SQL at the moment. It'll
be more complete if you use HiveQL.
That said, the problem here is you are calling .name on an array. You need
to pick an item from the array (using [..]) or use something like a lateral
view explode.
On Sat, Jul 12,
I just wanted to send out a quick note about a change in the handling of
strings when loading / storing data using parquet and Spark SQL. Before,
Spark SQL did not support binary data in Parquet, so all binary blobs were
implicitly treated as Strings. 9fe693
Have you upgraded the cluster where you are running this 1.0.1 as
well? A NoSuchMethodError
almost always means that the class files available at runtime are different
from those that were there when you compiled your program.
On Mon, Jul 14, 2014 at 7:06 PM, SK skrishna...@gmail.com wrote:
Hi Andy,
The SQL parser is pretty basic (we plan to improve this for the 1.2
release). In this case I think part of the problem is that one of your
variables is count, which is a reserved word. Unfortunately, we don't
have the ability to escape identifiers at this point.
However, I did manage
Are you sure the code running on the cluster has been updated? We recently
optimized the execution of LIKE queries that can be evaluated without using
full regular expressions. So it's possible this error is due to missing
functionality on the executors.
How can I trace this down for a bug
On Thu, Jul 10, 2014 at 2:08 PM, Jerry Lam chiling...@gmail.com wrote:
For the curious mind, the dataset is about 200-300GB and we are using 10
machines for this benchmark. Given the env is equal between the two
experiments, why pure spark is faster than SparkSQL?
There is going to be some
Hi Jerry,
Thanks for reporting this. It would be helpful if you could provide the
output of the following command:
println(hql(select s.id from m join s on (s.id=m_id)).queryExecution)
Michael
On Thu, Jul 10, 2014 at 8:15 AM, Jerry Lam chiling...@gmail.com wrote:
Hi Spark developers,
I
I'll add that the SQL parser is very limited right now, and that you'll get
much wider coverage using hql inside of HiveContext. We are working on
bringing sql() much closer to SQL-92 though in the future.
On Thu, Jul 10, 2014 at 7:28 AM, premdass premdas...@yahoo.co.in wrote:
Thanks Takuya .
There is no version of Shark that is compatible with Spark 1.0, however,
Spark SQL does come included automatically. More information here:
http://databricks.com/blog/2014/07/01/shark-spark-sql-hive-on-spark-and-the-future-of-sql-on-spark.html
SerDes overhead, then there must be something
additional that SparkSQL adds to the overall overheads that Hive doesn't
have.
Best Regards,
Jerry
On Thu, Jul 10, 2014 at 7:11 PM, Michael Armbrust mich...@databricks.com
wrote:
On Thu, Jul 10, 2014 at 2:08 PM, Jerry Lam chiling
,
Jerry
On Thu, Jul 10, 2014 at 7:16 PM, Michael Armbrust mich...@databricks.com
wrote:
Hi Jerry,
Thanks for reporting this. It would be helpful if you could provide the
output of the following command:
println(hql(select s.id from m join s on (s.id=m_id)).queryExecution)
Michael
At first glance that looks like an error with the class shipping in the
spark shell. (i.e. the line that you type into the spark shell are
compiled into classes and then shipped to the executors where they run).
Are you able to run other spark examples with closures in the same shell?
Michael
This is on the roadmap for the next release (1.1)
JIRA: SPARK-2179 https://issues.apache.org/jira/browse/SPARK-2179
On Mon, Jul 7, 2014 at 11:48 PM, Ionized ioni...@gmail.com wrote:
The Java API requires a Java Class to register as table.
// Apply a schema to an RDD of JavaBeans and
you have an estimate on when some will
be available?)
On Tue, Jul 8, 2014 at 12:24 AM, Michael Armbrust mich...@databricks.com
wrote:
This is on the roadmap for the next release (1.1)
JIRA: SPARK-2179 https://issues.apache.org/jira/browse/SPARK-2179
On Mon, Jul 7, 2014 at 11:48 PM
On Tue, Jul 8, 2014 at 12:43 PM, Pierre B
pierre.borckm...@realimpactanalytics.com wrote:
1/ Is there a way to convert a SchemaRDD (for instance loaded from a
parquet
file) back to a RDD of a given case class?
There may be someday, but doing so will either require a lot of reflection
or a
I haven't heard any reports of this yet, but I don't see any reason why it
wouldn't work. You'll need to manually convert the objects that come out of
the sequence file into something where SparkSQL can detect the schema (i.e.
scala case classes or java beans) before you can register the RDD as a
901 - 1000 of 1052 matches
Mail list logo