Re: how the sparksession initialization, set currentDatabase value?

2017-01-10 Thread smartzjp
I think if you want to run spark sql on CLI this configuration will be ok, but if you want to run with distributed query engine, start the JDBC/ODBC server and set the hive address info. You can reference this description for more detail.

Spark sql query plan contains all the partitions from hive table even though filtering of partitions is provided

2017-01-10 Thread Raju Bairishetti
Hello, Spark sql is generating query plan with all partitions information even though if we apply filters on partitions in the query. Due to this, spark driver/hive metastore is hitting with OOM as each table is with lots of partitions. We can confirm from hive audit logs that it tries to

how the sparksession initialization, set currentDatabase value?

2017-01-10 Thread 李斌松
Spark read hive table, catalog. CurrentDatabase value is the default, how the sparksession initialization, set currentDatabase value? hive.metastore.uris thrift://localhost:9083 IP address (or fully-qualified domain name) and port of the metastore host

large number representation

2017-01-10 Thread Zephod
I want to process in Spark large numbers, for example 160 bits. I could store them as an array of ints or as a java.util.BitSet or something with compression like https://github.com/lemire/javaewah or https://github.com/RoaringBitmap/RoaringBitmap. My question is what should I use so that Spark

Spark in docker over EC2

2017-01-10 Thread Darren Govoni
Anyone got a good guide for getting spark master to talk to remote workers inside dockers? I followed the tips found by searching but doesn't work still. Spark 1.6.2. I exposed all the ports and tried to set local IP inside container to the host IP but spark complains it can't bind ui ports.

Re: Could not parse Master URL for Mesos on Spark 2.1.0

2017-01-10 Thread Michael Gummelt
Oh, interesting. I've never heard of that sort of architecture. And I'm not sure exactly how the JNI bindings do the native library discovery, but I know the MESOS_NATIVE_JAVA_LIBRARY env var has always been the documented discovery method, so I'd definitely always provide that if I were you.

Re: Nested ifs in sparksql

2017-01-10 Thread Olivier Girardot
Are you using the "case when" functions ? what do you mean by slow ? can you share a snippet ? On Tue, Jan 10, 2017 8:15 PM, Georg Heiler georg.kf.hei...@gmail.com wrote: Maybe you can create an UDF? Raghavendra Pandey schrieb am Di., 10. Jan. 2017 um 20:04 

Re: Could not parse Master URL for Mesos on Spark 2.1.0

2017-01-10 Thread Olivier Girardot
nop, there is no "distribution", no spark-submit at the start of my process.But I found the problem, the behavior when loading mesos native dependency changed, and the static initialization block inside org.apache.mesos.MesosSchedulerDriver needed the specific reference to libmesos-1.0.0.so. So

Re: Could not parse Master URL for Mesos on Spark 2.1.0

2017-01-10 Thread Michael Gummelt
What do you mean your driver has all the dependencies packaged? What are "all the dependencies"? Is the distribution you use to launch your driver built with -Pmesos? On Tue, Jan 10, 2017 at 12:18 PM, Olivier Girardot < o.girar...@lateral-thoughts.com> wrote: > Hi Michael, > I did so, but it's

Re: Could not parse Master URL for Mesos on Spark 2.1.0

2017-01-10 Thread Olivier Girardot
Hi Michael,I did so, but it's not exactly the problem, you see my driver has all the dependencies packaged, and only the executors fetch via the spark.executor.uri the tgz,The strange thing is that I see in my classpath the org.apache.mesos:mesos-1.0.0-shaded-protobuf dependency packaged in the

Re: Dataset Type safety

2017-01-10 Thread Michael Armbrust
> > As I've specified *.as[Person]* which does schema inferance then > *"option("inferSchema","true")" *is redundant and not needed! The resolution of fields is done by name, not by position for case classes. This is what allows us to support more complex things like JSON or nested structures.

Library dependencies in Spark

2017-01-10 Thread Keith Turner
I recently wrote a blog post[1] sharing my experiences with using Apache Spark to load data into Apache Fluo. One of the things I cover in this blog post is late binding of dependencies and exclusion of provided dependencies when building a shaded jar. When writing the post, I was unsure about

Dataset Type safety

2017-01-10 Thread A Shaikh
I have a simple people.csv and following SimpleApp people.csv -- name,age abc,22 xyz,32 Working Code Object SimpleApp {} case class Person(name: String, age: Long) def main(args: Array[String]): Unit = { val spark

SparkAppHandle.Listener: infoChanged

2017-01-10 Thread Benson Qiu
The JavaDocs say that `infoChanged` is a "callback for changes in any information that is not the handle's state." *Can I get more information on what counts as a change in information? How often

Re: Nested ifs in sparksql

2017-01-10 Thread Georg Heiler
Maybe you can create an UDF? Raghavendra Pandey schrieb am Di., 10. Jan. 2017 um 20:04 Uhr: > I have of around 41 level of nested if else in spark sql. I have > programmed it using apis on dataframe. But it takes too much time. > Is there anything I can do to

Nested ifs in sparksql

2017-01-10 Thread Raghavendra Pandey
I have of around 41 level of nested if else in spark sql. I have programmed it using apis on dataframe. But it takes too much time. Is there anything I can do to improve on time here?

Re: Could not parse Master URL for Mesos on Spark 2.1.0

2017-01-10 Thread Michael Gummelt
Just build with -Pmesos http://spark.apache.org/docs/latest/building-spark.html#building-with-mesos-support On Tue, Jan 10, 2017 at 8:56 AM, Olivier Girardot < o.girar...@lateral-thoughts.com> wrote: > I had the same problem, added spark-mesos as dependency and now I get : > [2017-01-10

Shortest path performance in Graphx with Spark

2017-01-10 Thread Gerard Casey
Hello everyone, I am creating a graph from a `gz` compressed `json` file of `edge` and `vertices` type. I have put the files in a dropbox folder [here][1] I load and map these `json` records to create the `vertices` and `edge` types required by `graphx` like this: val vertices_raw =

Re: backward compatibility

2017-01-10 Thread Marco Mistroni
I think old APIs are still supported but u r advised to migrate I migrated few apps from 1.6 to 2.0 with minimal changes Hth On 10 Jan 2017 4:14 pm, "pradeepbill" wrote: > hi there, I am using spark 1.4 code and now we plan to move to spark 2.0, > and > when I check

Re: Could not parse Master URL for Mesos on Spark 2.1.0

2017-01-10 Thread Olivier Girardot
I had the same problem, added spark-mesos as dependency and now I get : [2017-01-10 17:45:16,575] {bash_operator.py:77} INFO - Exception in thread "main" java.lang.NoClassDefFoundError: Could not initialize class org.apache.mesos.MesosSchedulerDriver[2017-01-10 17:45:16,576] {bash_operator.py:77}

backward compatibility

2017-01-10 Thread pradeepbill
hi there, I am using spark 1.4 code and now we plan to move to spark 2.0, and when I check the documentation below, there are only a few features backward compatible, does that mean I have change most of my code , please advice. One of the largest changes in Spark 2.0 is the new updated APIs:

Re: Kryo On Spark 1.6.0

2017-01-10 Thread Yang Cao
If you don’t mind, could please share me with the scala solution? I tried to use kryo but seamed not work at all. I hope to get some practical example. THX > On 2017年1月10日, at 19:10, Enrico DUrso wrote: > > Hi, > > I am trying to use Kryo on Spark 1.6.0. > I am able to

Re: Spark Read from Google store and save in AWS s3

2017-01-10 Thread A Shaikh
This should help https://cloud.google.com/hadoop/examples/bigquery-connector-spark-example On 8 January 2017 at 03:49, neil90 wrote: > Here is how you would read from Google Cloud Storage(note you need to > create > a service account key) -> > >

RE: Kryo On Spark 1.6.0

2017-01-10 Thread Enrico DUrso
Hi, I agree with you Richard. The point is that, looks like some classes which are used internally by Spark are not registered (for instance, the one I mentioned in the previous email is something I am not directly using). For those classes the serialization performance will be poor in

Re: Kryo On Spark 1.6.0

2017-01-10 Thread Richard Startin
Hi Enrico, Only set spark.kryo.registrationRequired if you want to forbid any classes you have not explicitly registered - see http://spark.apache.org/docs/latest/configuration.html. Configuration - Spark 2.0.2 Documentation

Re: Spark ML's RandomForestClassifier OOM

2017-01-10 Thread Julio Antonio Soto de Vicente
No. I am running Spark on YARN on a 3 node testing cluster. My guess is that given the amount of splits done by a hundred trees of depth 30 (which should be more than 100 * 2^30), either the executors or the driver die OOM while trying to store all the split metadata. I guess that the same

Kryo On Spark 1.6.0

2017-01-10 Thread Enrico DUrso
Hi, I am trying to use Kryo on Spark 1.6.0. I am able to register my own classes and it works, but when I set "spark.kryo.registrationRequired " to true, I get an error about a scala class: "Class is not registered: scala.collection.mutable.WrappedArray$ofRef". Any of you has already solved

Re: spark-shell running out of memory even with 6GB ?

2017-01-10 Thread Sean Owen
Maybe ... here are a bunch of things I'd check: Are you running out of memory, or just see a lot of mem usage? JVMs will happily use all the memory you allow them even if some of it could be reclaimed. Did the driver run out of mem? did you give 6G to the driver or executor? OOM errors do show

Re: Spark ML's RandomForestClassifier OOM

2017-01-10 Thread Marco Mistroni
You running locally? Found exactly same issue. 2 solutions: _ reduce datA size. _ run on EMR Hth On 10 Jan 2017 10:07 am, "Julio Antonio Soto" wrote: > Hi, > > I am running into OOM problems while training a Spark ML > RandomForestClassifier (maxDepth of 30, 32 maxBins, 100

Spark ML's RandomForestClassifier OOM

2017-01-10 Thread Julio Antonio Soto
Hi, I am running into OOM problems while training a Spark ML RandomForestClassifier (maxDepth of 30, 32 maxBins, 100 trees). My dataset is arguably pretty big given the executor count and size (8x5G), with approximately 20M rows and 130 features. The "fun fact" is that a single

Spark JDBC Data type mapping (Float and smallInt) Issue

2017-01-10 Thread santlal56
Hi, I am new to spark scala development.I have created job to read data from mysql table using existing data source API(Spark jdbc). I have question regarding data type mapping, these are as follows. *Scenario 1:* I have created table with float type in mysql but while reading through spark jdbc i