You can extract the nested fields in sql: SELECT field.nestedField ...
If you don't do that then nested fields are represented as rows within rows
and can be retrieved as follows:
t.getAs[Row](0).getInt(0)
Also, I would write t.getAs[Buffer[CharSequence]](12) as
t.getAs[Seq[String]](12) since
RDD before performing analytics on it.
Thank you for your time and help on this.
P.S. I am using python if that makes a difference.
On Wed, Nov 19, 2014 at 4:45 PM, Michael Armbrust mich...@databricks.com
wrote:
In general you should be able to read full directories of files as a
single
I think you should also be able to get away with casting it back and forth
in this case using .asInstanceOf.
On Wed, Nov 19, 2014 at 4:39 PM, Daniel Siegmann daniel.siegm...@velos.io
wrote:
I have a class which is a subclass of Tuple2, and I want to use it with
PairRDDFunctions. However, I
)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
On Tue, Nov 18, 2014 at 11:41 AM, Michael Armbrust mich...@databricks.com
wrote:
Interesting, I believe we have
This is an unfortunate/known issue that we are hoping to address in the
next release: https://issues.apache.org/jira/browse/SPARK-2087
I'm not sure how straightforward a fix would be, but it would involve
keeping / setting the SessionState for each connection to the server. It
would be great if
You are perhaps hitting an issue that was fixed by #3248
https://github.com/apache/spark/pull/3248?
On Mon, Nov 17, 2014 at 9:58 AM, Sadhan Sood sadhan.s...@gmail.com wrote:
While testing sparkSQL, we were running this group by with expression
query and got an exception. The same query worked
What version of Spark SQL?
On Sat, Nov 15, 2014 at 10:25 PM, Eric Zhen zhpeng...@gmail.com wrote:
Hi all,
We run SparkSQL on TPCDS benchmark Q19 with spark.sql.codegen=true, we
got exceptions as below, has anyone else saw these before?
java.lang.ExceptionInInitializerError
at
Anyone want a PR?
Yes please.
I'd guess that its an s3n://key:secret_key@bucket/path from the UNLOAD
command used to produce the data. Xiangrui can correct me if I'm wrong
though.
On Fri, Nov 14, 2014 at 2:19 PM, Gary Malouf malouf.g...@gmail.com wrote:
We have a bunch of data in RedShift tables that we'd like to pull in
If I use row[6] instead of row[text] I get what I am looking for.
However, finding the right numeric index could be a pain.
Can I access the fields in a Row of a SchemaRDD by name, so that I can
map, filter, etc. without a trial and error process of finding the right
int for the
There are a few things you can do here:
- Infer the schema on a subset of the data, pass that inferred schema
(schemaRDD.schema) as the second argument of jsonRDD.
- Hand construct a schema and pass it as the second argument including the
fields you are interested in.
- Instead load the data
Xiangrui is correct that is must be a java bean, also nested classes are
not yet supported in java.
On Tue, Nov 11, 2014 at 10:11 AM, Xiangrui Meng men...@gmail.com wrote:
I think you need a Java bean class instead of a normal class. See
example here:
There is a JIRA for adding this:
https://issues.apache.org/jira/browse/SPARK-4228
Your described approach sounds reasonable.
On Mon, Nov 10, 2014 at 5:10 PM, Tobias Pfeiffer t...@preferred.jp wrote:
Akshat
On Tue, Nov 11, 2014 at 4:12 AM, Akshat Aranya aara...@gmail.com wrote:
Does there
, November 06, 2014 12:28 PM
To: Michael Armbrust
Cc: u...@spark.incubator.apache.org
Subject: RE: Dynamically InferSchema From Hive and Create parquet file
When I create Hive table with Parquet format, it does not create any
metadata until data in inserted. So data needs to be there before I infer
It can, but currently that method uses the default hive serde which is not
very robust (does not deal well with \n in strings) and probably is not
super fast. You'll also need to be using a HiveContext for it to work.
On Tue, Nov 4, 2014 at 8:20 PM, vdiwakar.malladi vdiwakar.mall...@gmail.com
That method is for creating a new directory to hold parquet data when there
is no hive metastore available, thus you have to specify the schema.
If you've already created the table in the metastore you can just query it
using the sql method:
javahiveConxted.sql(SELECT * FROM parquetTable);
You
This is not supported yet. It would be great if you could open a JIRA
(though I think apache JIRA is down ATM).
On Tue, Nov 4, 2014 at 9:40 AM, Terry Siu terry@smartfocus.com wrote:
I’m trying to execute a subquery inside an IN clause and am encountering
an unsupported language feature
Temporary tables are local to the context that creates them (just like
RDDs). I'd recommend saving the data out as Parquet to share it between
contexts.
On Tue, Nov 4, 2014 at 3:18 AM, vdiwakar.malladi vdiwakar.mall...@gmail.com
wrote:
Hi,
There is a need in my application to query the
Structs are Rows nested in other rows. This might also be helpful:
http://spark.apache.org/docs/latest/sql-programming-guide.html#programmatically-specifying-the-schema
On Tue, Nov 4, 2014 at 12:21 PM, tridib tridib.sama...@live.com wrote:
How do I create a StructField of StructType? I need to
They both compile down to the same logical plans so the performance of
running the query should be the same. The Scala DSL uses a lot of Scala
magic and thus is experimental where as HiveQL is pretty set in stone.
On Tue, Nov 4, 2014 at 5:22 PM, SK skrishna...@gmail.com wrote:
SchemaRDD
That sounds like a regression. Could you open a JIRA with steps to
reproduce (https://issues.apache.org/jira/browse/SPARK)? We'll want to fix
this before the 1.2 release.
On Mon, Nov 3, 2014 at 11:04 AM, Terry Siu terry@smartfocus.com wrote:
Is there any reason why StringType is not a
On Mon, Nov 3, 2014 at 12:45 AM, Bojan Kostic blood9ra...@gmail.com wrote:
But will this improvement also affect when you want to count distinct on 2
or more fields:
SELECT COUNT(f1), COUNT(DISTINCT f2), COUNT(DISTINCT f3), COUNT(DISTINCT
f4)
FROM parquetFile
Unfortunately I think this
It is merged!
On Mon, Nov 3, 2014 at 12:06 PM, Terry Siu terry@smartfocus.com wrote:
Thanks, Kousuke. I’ll wait till this pull request makes it into the
master branch.
-Terry
From: Kousuke Saruta saru...@oss.nttdata.co.jp
Date: Monday, November 3, 2014 at 11:11 AM
To: Terry Siu
That should be possible, although I'm not super familiar with thrift.
You'll probably need access to the generated metadata
http://people.apache.org/~thejas/thrift-0.9/javadoc/org/apache/thrift/meta_data/package-frame.html
.
Shameless plug If you find yourself reading a lot of thrift data you
Hmmm, this looks like a bug. Can you file a JIRA?
On Thu, Oct 30, 2014 at 4:04 PM, Jean-Pascal Billaud j...@tellapart.com
wrote:
Hi,
While testing SparkSQL on top of our Hive metastore, I am getting
some java.lang.ArrayIndexOutOfBoundsException while reusing a cached RDD
table.
LATERAL VIEW explode(locations) l AS location JOIN locationNames ln ON
location.number = ln.streetNumber WHERE location.number = '2300').collect()
On Tue, Oct 28, 2014 at 10:19 PM, Michael Armbrust
mich...@databricks.com wrote:
On Tue, Oct 28, 2014 at 6:56 PM, Corey Nolet cjno...@gmail.com
DISTRIBUTE BY only promises that data will be collocated, but does not
create a partition for each value. You are probably looking for Dynamic
Partitions
https://cwiki.apache.org/confluence/display/Hive/DynamicPartitions, which
was recently merged into HiveContext.
On Tue, Oct 28, 2014 at 11:49
Try: address.city.attr
On Tue, Oct 28, 2014 at 8:30 AM, Brett Antonides banto...@gmail.com wrote:
Hello,
Given the following example customers.json file:
{
name: Sherlock Holmes,
customerNumber: 12345,
address: {
street: 221b Baker Street,
city: London,
zipcode: NW1 6XE,
country:
This feature is not in 1.1 and is not going to promise one file per unique
value of the data. The only way to do that would be to write your own
partitioner
http://stackoverflow.com/questions/23127329/how-to-define-custom-partitioner-for-spark-rdds-of-equally-sized-partition-where
.
On Tue, Oct
On Tue, Oct 28, 2014 at 2:19 PM, Corey Nolet cjno...@gmail.com wrote:
Is it possible to select if, say, there was an addresses field that had a
json array?
You can get the Nth item by address.getItem(0). If you want to walk
through the whole array look at LATERAL VIEW EXPLODE in HiveQL
: [{ street:Rodeo Dr,
number:2300 }]}
And query all people who have a location with number = 2300?
On Tue, Oct 28, 2014 at 5:30 PM, Michael Armbrust mich...@databricks.com
wrote:
On Tue, Oct 28, 2014 at 2:19 PM, Corey Nolet cjno...@gmail.com wrote:
Is it possible to select if, say
On Tue, Oct 28, 2014 at 6:56 PM, Corey Nolet cjno...@gmail.com wrote:
Am I able to do a join on an exploded field?
Like if I have another object:
{ streetNumber:2300, locationName:The Big Building} and I want to
join with the previous json by the locations[].number field- is that
possible?
JOIN locationNames ln ON
location.number = ln.streetNumber WHERE location.number = '2300').collect()
On Tue, Oct 28, 2014 at 10:19 PM, Michael Armbrust mich...@databricks.com
wrote:
On Tue, Oct 28, 2014 at 6:56 PM, Corey Nolet cjno...@gmail.com wrote:
Am I able to do a join
No such method error almost always means you are mixing different versions
of the same library on the classpath. In this case it looks like you have
more than one version of guava. Have you added anything to the classpath?
On Mon, Oct 27, 2014 at 8:36 AM, nitinkak001 nitinkak...@gmail.com
I'd suggest checking out the Spark SQL programming guide to answer this
type of query:
http://spark.apache.org/docs/latest/sql-programming-guide.html
You could also perform it using the raw Spark RDD API
http://spark.apache.org/docs/1.1.0/api/scala/index.html#org.apache.spark.rdd.RDD,
but its
:331)*
*at
org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)*
*at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)*
*Caused by: java.lang.ClassNotFoundException:
org.apache.hadoop.hive.conf.HiveConf*
On Mon, Oct 27, 2014 at 1:57 PM, Michael Armbrust mich
You can access cached data in spark through the JDBC server:
http://spark.apache.org/docs/latest/sql-programming-guide.html#running-the-thrift-jdbc-server
On Mon, Oct 27, 2014 at 1:47 PM, Ron Ayoub ronalday...@live.com wrote:
We have a table containing 25 features per item id along with
Yeah, sorry for being unclear. Subquery expressions are not supported.
That particular error was coming from the Hive parser.
On Mon, Oct 27, 2014 at 4:03 PM, Daniel Klinger d...@web-computing.de wrote:
So it dosen't matter which dialect im using? Caus i set spark.sql.dialect
to
sql.
--
This is very experimental and mostly unsupported, but you can start the
JDBC server from within your own programs
https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2.scala#L45
by
passing it the HiveContext.
On
It does have support for caching using either CACHE TABLE tablename or
CACHE TABLE tablename AS SELECT
On Fri, Oct 24, 2014 at 1:05 AM, ankits ankitso...@gmail.com wrote:
I want to set up spark SQL to allow ad hoc querying over the last X days of
processed data, where the data is
You might be hitting: https://issues.apache.org/jira/browse/SPARK-4037
On Fri, Oct 24, 2014 at 11:32 AM, Yana Kadiyska yana.kadiy...@gmail.com
wrote:
Hi all,
I'm trying to set a pool for a JDBC session. I'm connecting to the thrift
server via JDBC client.
My installation appears to be
tables ?
On Fri, Oct 24, 2014 at 2:35 PM, Michael Armbrust mich...@databricks.com
wrote:
It does have support for caching using either CACHE TABLE tablename or
CACHE TABLE tablename AS SELECT
On Fri, Oct 24, 2014 at 1:05 AM, ankits ankitso...@gmail.com wrote:
I want to set up spark SQL
This is very experimental and mostly unsupported, but you can start the
JDBC server from within your own programs
https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2.scala#L45
by passing it the HiveContext.
On
Usually when the SparkContext throws an NPE it means that it has been shut
down due to some earlier failure.
On Wed, Oct 22, 2014 at 5:29 PM, arthur.hk.c...@gmail.com
arthur.hk.c...@gmail.com wrote:
Hi,
I got java.lang.NullPointerException. Please help!
sqlContext.sql(select l_orderkey,
Can you show the DDL for the table? It looks like the SerDe might be
saying it will produce a decimal type but is actually producing a string.
On Thu, Oct 23, 2014 at 3:17 PM, arthur.hk.c...@gmail.com
arthur.hk.c...@gmail.com wrote:
Hi
My Spark is 1.1.0 and Hive is 0.12, I tried to run the
Yes, when using a HiveContext.
On Wed, Oct 22, 2014 at 2:20 PM, shahab shahab.mok...@gmail.com wrote:
Hi,
I just wonder if SparkSQL supports Hive built-in functions (e.g.
from_unixtime) or any of the functions pointed out here : (
https://cwiki.apache.org/confluence/display/Hive/Tutorial)
The JDBC server is what you are looking for:
http://spark.apache.org/docs/latest/sql-programming-guide.html#running-the-thrift-jdbc-server
On Wed, Oct 22, 2014 at 11:10 AM, Sadhan Sood sadhan.s...@gmail.com wrote:
We want to run multiple instances of spark sql cli on our yarn cluster.
Each
No, analytic and window functions do not work yet.
On Tue, Oct 21, 2014 at 3:00 AM, Pierre B
pierre.borckm...@realimpactanalytics.com wrote:
Hi!
The RANK function is available in hive since version 0.11.
When trying to use it in SparkSQL, I'm getting the following exception
(full
Hmm... I thought HiveContext will only worki if Hive is present. I am
curious
to know when to use HiveContext and when to use SqlContext.
http://spark.apache.org/docs/latest/sql-programming-guide.html#getting-started
TLDR; Always use HiveContext if your application does not have a dependency
You need to import sqlCtx._ to get access to the implicit conversion.
On Tue, Oct 21, 2014 at 2:40 PM, Kevin Paul kevinpaulap...@gmail.com
wrote:
Hi all, I tried to use the function SchemaRDD.where() but got some error:
val people = sqlCtx.sql(select * from people)
people.where('age ===
Have you tried this on master? There were several problems with resolution
of complex queries that were registered as tables in the 1.1.0 release.
On Mon, Oct 20, 2014 at 10:33 AM, Terry Siu terry@smartfocus.com
wrote:
Hi all,
I’m getting a TreeNodeException for unresolved attributes
I think you are running into a bug that will be fixed by this PR:
https://github.com/apache/spark/pull/2850
On Mon, Oct 20, 2014 at 4:34 PM, tridib tridib.sama...@live.com wrote:
Hello Experts,
After repeated attempt I am unable to run query on map json date string. I
tried two approaches:
Looks like this data was encoded with an old version of Spark SQL. You'll
need to set the flag to interpret binary data as a string. More info on
configuration can be found here:
http://spark.apache.org/docs/latest/sql-programming-guide.html#configuration
sqlContext.sql(set
Support for dynamic partitioning is available in master and will be part of
Spark 1.2
On Thu, Oct 16, 2014 at 1:08 AM, Banias H banias4sp...@gmail.com wrote:
I got tipped by an expert that the error of Unsupported language
features in query that I had was due to the fact that SparkSQL does not
Its much more efficient to store and compute on numeric types than string
types.
On Tue, Oct 14, 2014 at 1:25 AM, invkrh inv...@gmail.com wrote:
Thank you, Michael.
In Spark SQL DataType, we have a lot of types, for example, ByteType,
ShortType, StringType, etc.
These types are used to
Is there any plan to support windowing queries? I know that Shark
supported it in its last release and expected it to be already included.
Someone from redhat is working on this. Unclear if it will make the 1.2
release.
Its not on the roadmap for 1.2. I'd suggest opening a JIRA.
On Mon, Oct 13, 2014 at 4:28 AM, Pierre B
pierre.borckm...@realimpactanalytics.com wrote:
Is it planned in a near future ?
--
View this message in context:
This conversion is done implicitly anytime you use a string column in an
operation with a numeric column. If you run explain on your query you
should see the cast that is inserted. This is intentional and based on the
type semantics of Apache Hive.
On Mon, Oct 13, 2014 at 9:03 AM, invkrh
If you are running a version 1.1 you can create external parquet tables.
I'd recommend setting spark.sql.hive.convertMetastoreParquet=true. Here's a
helper function to do it automatically:
/**
* Sugar for creating a Hive external table from a parquet path.
*/
def createParquetTable(name:
There are some known bug with the parquet serde and spark 1.1.
You can try setting spark.sql.hive.convertMetastoreParquet=true to cause
spark sql to use built in parquet support when the serde looks like parquet.
On Mon, Oct 13, 2014 at 2:57 PM, Terry Siu terry@smartfocus.com wrote:
I am
Please file a JIRA:https://issues.apache.org/jira/browse/SPARK/
https://www.google.com/url?q=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FSPARK%2Fsa=Dsntz=1usg=AFQjCNFS_GnMso2OCOITA0TSJ5U10b3JSQ
On Thu, Oct 9, 2014 at 6:48 PM, Anand Mohan chinn...@gmail.com wrote:
Hi,
I just noticed the
to query in SQL and apply scala functions as UDFs in the SQL is
extremely convenient. Project pushdown works flawlessly, not much sure
about predicate pushdown
(we have 90% optional fields in our dataset and I remember Michael
Armbrust telling me that this is a bug in Parquet in that it doesnt allow
Thanks for the input. We purposefully made sure that the config option did
not make it into a release as it is not something that we are willing to
support long term. That said we'll try and make this easier in the future
either through hints or better support for statistics.
In this particular
Thats a good question, I'm not sure if that will work. I will note that we
are hoping to do some upgrades of our parquet support in the near future.
On Tue, Oct 7, 2014 at 10:33 PM, Michael Allman mich...@videoamp.com
wrote:
Hello,
I was interested in testing Parquet V2 with Spark SQL, but
Using SUM on a string should automatically cast the column. Also you can
use CAST to change the datatype
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-TypeConversionFunctions
.
What version of Spark are you running? This could be
, Michael Armbrust mich...@databricks.com
wrote:
Using SUM on a string should automatically cast the column. Also you can
use CAST to change the datatype
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-TypeConversionFunctions
.
What version of Spark are you
No, not yet. Only Hive UDAFs are supported.
On Mon, Oct 6, 2014 at 2:18 AM, Pei-Lun Lee pl...@appier.com wrote:
Hi,
Does spark sql currently support user-defined custom aggregation function
in scala like the way UDF defined with sqlContext.registerFunction? (not
hive UDAF)
Thanks,
--
Are you running master? There was briefly a regression here that is
hopefully fixed by spark#2635 https://github.com/apache/spark/pull/2635.
On Fri, Oct 3, 2014 at 1:43 AM, Kevin Paul kevinpaulap...@gmail.com wrote:
Hi all, I tried to launch my application with spark-submit, the command I
use
Often java.lang.NoSuchMethodError means that you have more than one version
of a library on your classpath, in this case it looks like hive.
On Thu, Oct 2, 2014 at 8:44 PM, Li HM hmx...@gmail.com wrote:
I have rebuild package with -Phive
Copied hive-site.xml to conf (I am using hive-0.12)
(DelegatingMethodAccessorImpl.java:43)
Let me know if any of these warrant a JIRA
thanks
On Thu, Oct 2, 2014 at 2:00 PM, Michael Armbrust mich...@databricks.com
wrote:
What are the errors you are seeing. All of those functions should work.
On Thu, Oct 2, 2014 at 6:56 AM, Yana
by: java.lang.ClassNotFoundException: Class
org.apache.hadoop.hdfs.server.namenode.ha.IPFailoverProxyProvider not found
at
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1801)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1893)
... 57 more
On Fri, Oct 3, 2014 at 1:55 AM, Michael
The bug is likely in your data. Do you have lines in your input file that
do not contain the \t character? If so .split will only return a single
element and p(1) from the .map() is going to throw java.lang.
ArrayIndexOutOfBoundsException: 1
On Thu, Oct 2, 2014 at 3:35 PM, SK
parquetFile accepts a comma separated list of files.
Also, unionAll does not write to disk. However, unless you are running a
recent version (compiled from master since this was added
https://github.com/apache/spark/commit/f858f466862541c3faad76a1fa2391f1c17ec9dd)
its missing an optimization and
We actually leave all the DDL commands up to hive, so there is no
programatic way to access the things you are looking for.
On Thu, Oct 2, 2014 at 5:17 PM, Banias calvi...@yahoo.com.invalid wrote:
Hi,
Would anybody know how to get the following information from HiveContext
given a Hive table
This is hard to do in general, but you can get what you are asking for by
putting the following class in scope.
implicit class BetterRDD[A: scala.reflect.ClassTag](rdd:
org.apache.spark.rdd.RDD[A]) {
def dropOne = rdd.mapPartitionsWithIndex((i, iter) = if(i == 0
iter.hasNext) { iter.next; iter
You are likely running into SPARK-3708
https://issues.apache.org/jira/browse/SPARK-3708, which was fixed by #2594
https://github.com/apache/spark/pull/2594 this morning.
On Wed, Oct 1, 2014 at 8:09 AM, tonsat ton...@gmail.com wrote:
We have a configuration CDH5.0,Spark1.1.0(stand alone) and
I'll note that the DSL is pretty experimental. That said you should be
able to do something like user.id.attr
On Mon, Sep 29, 2014 at 3:39 PM, Benyi Wang bewang.t...@gmail.com wrote:
scala user
res19: org.apache.spark.sql.SchemaRDD =
SchemaRDD[0] at RDD at SchemaRDD.scala:98
== Query Plan
Views are not supported yet. Its not currently on the near term roadmap,
but that can change if there is sufficient demand or someone in the
community is interested in implementing them. I do not think it would be
very hard.
Michael
On Sun, Sep 28, 2014 at 11:59 AM, Du Li
This is not possible until https://github.com/apache/spark/pull/2501 is
merged.
On Sun, Sep 28, 2014 at 6:39 PM, Haopu Wang hw...@qilinsoft.com wrote:
Thanks for the response. From Spark Web-UI's Storage tab, I do see
cached RDD there.
But the storage level is Memory Deserialized 1x
You might consider instead storing the data using saveAsParquetFile and
then querying that after running
sqlContext.parquetFile(...).registerTempTable(...).
On Sun, Sep 28, 2014 at 6:43 PM, Michael Armbrust mich...@databricks.com
wrote:
This is not possible until https://github.com/apache/spark
Based on your first example it looks like what you want is actually run
length encoding (which parquet does support
https://github.com/Parquet/parquet-format/blob/master/Encodings.md).
Repetition and definition levels are used to reconstruct nested or repeated
(arrays) data that has been shredded
This behavior is inherited from the parquet input format that we use. You
could list the files manually and pass them as a comma separated list.
On Wed, Sep 24, 2014 at 7:46 AM, Marius Soutier mps@gmail.com wrote:
Hello,
sc.textFile and so on support wildcards in their path, but
outside of Spark's
control?
Nick
On Wed, Sep 24, 2014 at 1:01 PM, Michael Armbrust mich...@databricks.com
wrote:
This behavior is inherited from the parquet input format that we use.
You could list the files manually and pass them as a comma separated list.
On Wed, Sep 24, 2014 at 7:46 AM
Can you show me the DDL you are using? Here is an example of a way I got
the avro serde to work:
https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/TestHive.scala#L246
Also, this isn't ready for primetime yet, but a quick plug for some ongoing
work:
You can't directly query JSON tables from the CLI or JDBC server since
temporary tables only live for the life of the Spark Context. This PR will
eventually (targeted for 1.2) let you do what you want in pure SQL:
https://github.com/apache/spark/pull/2475
On Mon, Sep 22, 2014 at 4:52 PM, Yin
AM, Michael Armbrust mich...@databricks.com
wrote:
You can't directly query JSON tables from the CLI or JDBC server since
temporary tables only live for the life of the Spark Context. This PR will
eventually (targeted for 1.2) let you do what you want in pure SQL:
https://github.com/apache
I would hope that things should work for this kind of workflow.
I'm curious if you have tried using saveAsParquetFile instead of inserting
directly into a hive table (you could still register this as an external
table afterwards). Right now inserting into Hive tables is going to
through their
An exception should be thrown in the case of failure for DDL commands.
On Tue, Sep 23, 2014 at 4:55 PM, Du Li l...@yahoo-inc.com.invalid wrote:
Hi,
After executing sql() in SQLContext or HiveContext, is there a way to
tell whether the query/command succeeded or failed? Method sql()
These are coming from the parquet library and as far as I know can be
safely ignored.
On Mon, Sep 22, 2014 at 3:27 AM, Andrew Ash and...@andrewash.com wrote:
Hi All,
I'm seeing the below WARNINGs in stdout using Spark SQL in Spark 1.1.0 --
is this warning a known issue? I don't see any open
Spark SQL always uses a custom configuration of Kryo under the hood to
improve shuffle performance:
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlSerializer.scala
Michael
On Sun, Sep 21, 2014 at 9:04 AM, Grega Kešpret gr...@celtra.com
Check out the Spark SQL cli
https://spark.apache.org/docs/latest/sql-programming-guide.html#running-the-spark-sql-cli
.
On Wed, Sep 17, 2014 at 10:50 PM, David Rosenstrauch dar...@darose.net
wrote:
Is there a shell available for Spark SQL, similar to the way the Shark or
Hive shells work?
This looks like a bug, we are investigating.
On Thu, Sep 18, 2014 at 8:49 AM, Eric Friedman eric.d.fried...@gmail.com
wrote:
I have a SchemaRDD which I've gotten from a parquetFile.
Did some transforms on it and now want to save it back out as parquet
again.
Getting a SchemaRDD proves
- dev
Is it possible that you are constructing more than one HiveContext in a
single JVM? Due to global state in Hive code this is not allowed.
Michael
On Wed, Sep 17, 2014 at 7:21 PM, Cheng, Hao hao.ch...@intel.com wrote:
Hi, Du
I am not sure what you mean “triggers the HiveContext to
What is in your hive-site.xml?
On Thu, Sep 11, 2014 at 11:04 PM, linkpatrickliu linkpatrick...@live.com
wrote:
I am running Spark Standalone mode with Spark 1.1
I started SparkSQL thrift server as follows:
./sbin/start-thriftserver.sh
Then I use beeline to connect to it.
Now, I can
Something like the following should let you launch the thrift server on
yarn.
HADOOP_CONF_DIR=/etc/hadoop/conf HIVE_SERVER2_THRIFT_PORT=12345 MASTER=yarn-
client ./sbin/start-thriftserver.sh
On Thu, Sep 11, 2014 at 8:30 PM, Denny Lee denny.g@gmail.com wrote:
Could you provide some
This might be a better question to ask on the cassandra mailing list as I
believe that is where the exception is coming from.
On Thu, Sep 11, 2014 at 2:37 AM, lmk lakshmi.muralikrish...@gmail.com
wrote:
Hi,
My requirement is to extract certain fields from json files, run queries on
them and
What version of Spark SQL are you running here? I think a lot of your
concerns have likely been addressed in more recent versions of the code /
documentation. (Spark 1.1 should be published in the next few days)
In particular, for serious applications you should use a HiveContext and
HiveQL as
HiveQL is the default language for the JDBC server which will be available
as part of the 1.1 release (coming very soon!). Adding support for calling
MLlib and other spark libraries is on the roadmap, but not possible at this
moment.
On Tue, Sep 9, 2014 at 1:45 PM, XUE, Xiaohui
You are probably not getting an error because the exception is happening
inside of Hive. I'd still consider this a bug if you'd like to open a JIRA.
On Mon, Sep 8, 2014 at 3:02 AM, jamborta jambo...@gmail.com wrote:
thank you for the replies.
I am running an insert on a join (INSERT
I believe DataStax is working on better integration here, but until that is
ready you can use the applySchema API. Basically you will convert the
CassandraTable into and RDD of Row objects using a .map() and then you can
call applySchema (provided by SQLContext) to get a SchemaRDD.
More details
801 - 900 of 1052 matches
Mail list logo