date:20170110

Re: how the sparksession initialization, set currentDatabase value?

2017-01-10 Thread smartzjp

I think if you want to run spark sql on CLI this configuration will be ok, but if you want to run with distributed query engine, start the JDBC/ODBC server and set the hive address info. You can reference this description for more detail.

Spark sql query plan contains all the partitions from hive table even though filtering of partitions is provided

2017-01-10 Thread Raju Bairishetti

Hello, Spark sql is generating query plan with all partitions information even though if we apply filters on partitions in the query. Due to this, spark driver/hive metastore is hitting with OOM as each table is with lots of partitions. We can confirm from hive audit logs that it tries to

how the sparksession initialization, set currentDatabase value?

2017-01-10 Thread 李斌松

Spark read hive table, catalog. CurrentDatabase value is the default, how the sparksession initialization, set currentDatabase value? hive.metastore.uris thrift://localhost:9083 IP address (or fully-qualified domain name) and port of the metastore host

large number representation

2017-01-10 Thread Zephod

I want to process in Spark large numbers, for example 160 bits. I could store them as an array of ints or as a java.util.BitSet or something with compression like https://github.com/lemire/javaewah or https://github.com/RoaringBitmap/RoaringBitmap. My question is what should I use so that Spark

Spark in docker over EC2

2017-01-10 Thread Darren Govoni

Anyone got a good guide for getting spark master to talk to remote workers inside dockers? I followed the tips found by searching but doesn't work still. Spark 1.6.2. I exposed all the ports and tried to set local IP inside container to the host IP but spark complains it can't bind ui ports.

Re: Could not parse Master URL for Mesos on Spark 2.1.0

2017-01-10 Thread Michael Gummelt

Oh, interesting. I've never heard of that sort of architecture. And I'm not sure exactly how the JNI bindings do the native library discovery, but I know the MESOS_NATIVE_JAVA_LIBRARY env var has always been the documented discovery method, so I'd definitely always provide that if I were you.

Re: Nested ifs in sparksql

2017-01-10 Thread Olivier Girardot

Are you using the "case when" functions ? what do you mean by slow ? can you share a snippet ? On Tue, Jan 10, 2017 8:15 PM, Georg Heiler georg.kf.hei...@gmail.com wrote: Maybe you can create an UDF? Raghavendra Pandey schrieb am Di., 10. Jan. 2017 um 20:04

Re: Could not parse Master URL for Mesos on Spark 2.1.0

2017-01-10 Thread Olivier Girardot

nop, there is no "distribution", no spark-submit at the start of my process.But I found the problem, the behavior when loading mesos native dependency changed, and the static initialization block inside org.apache.mesos.MesosSchedulerDriver needed the specific reference to libmesos-1.0.0.so. So

Re: Could not parse Master URL for Mesos on Spark 2.1.0

2017-01-10 Thread Michael Gummelt

What do you mean your driver has all the dependencies packaged? What are "all the dependencies"? Is the distribution you use to launch your driver built with -Pmesos? On Tue, Jan 10, 2017 at 12:18 PM, Olivier Girardot < o.girar...@lateral-thoughts.com> wrote: > Hi Michael, > I did so, but it's

Re: Could not parse Master URL for Mesos on Spark 2.1.0

2017-01-10 Thread Olivier Girardot

Hi Michael,I did so, but it's not exactly the problem, you see my driver has all the dependencies packaged, and only the executors fetch via the spark.executor.uri the tgz,The strange thing is that I see in my classpath the org.apache.mesos:mesos-1.0.0-shaded-protobuf dependency packaged in the

Re: Dataset Type safety

2017-01-10 Thread Michael Armbrust

> > As I've specified *.as[Person]* which does schema inferance then > *"option("inferSchema","true")" *is redundant and not needed! The resolution of fields is done by name, not by position for case classes. This is what allows us to support more complex things like JSON or nested structures.

Library dependencies in Spark

2017-01-10 Thread Keith Turner

I recently wrote a blog post[1] sharing my experiences with using Apache Spark to load data into Apache Fluo. One of the things I cover in this blog post is late binding of dependencies and exclusion of provided dependencies when building a shaded jar. When writing the post, I was unsure about

Dataset Type safety

2017-01-10 Thread A Shaikh

I have a simple people.csv and following SimpleApp people.csv -- name,age abc,22 xyz,32 Working Code Object SimpleApp {} case class Person(name: String, age: Long) def main(args: Array[String]): Unit = { val spark

SparkAppHandle.Listener: infoChanged

2017-01-10 Thread Benson Qiu

The JavaDocs say that `infoChanged` is a "callback for changes in any information that is not the handle's state." *Can I get more information on what counts as a change in information? How often

Re: Nested ifs in sparksql

2017-01-10 Thread Georg Heiler

Maybe you can create an UDF? Raghavendra Pandey schrieb am Di., 10. Jan. 2017 um 20:04 Uhr: > I have of around 41 level of nested if else in spark sql. I have > programmed it using apis on dataframe. But it takes too much time. > Is there anything I can do to

Nested ifs in sparksql

2017-01-10 Thread Raghavendra Pandey

I have of around 41 level of nested if else in spark sql. I have programmed it using apis on dataframe. But it takes too much time. Is there anything I can do to improve on time here?

Re: Could not parse Master URL for Mesos on Spark 2.1.0

2017-01-10 Thread Michael Gummelt

Just build with -Pmesos http://spark.apache.org/docs/latest/building-spark.html#building-with-mesos-support On Tue, Jan 10, 2017 at 8:56 AM, Olivier Girardot < o.girar...@lateral-thoughts.com> wrote: > I had the same problem, added spark-mesos as dependency and now I get : > [2017-01-10

Shortest path performance in Graphx with Spark

2017-01-10 Thread Gerard Casey

Hello everyone, I am creating a graph from a `gz` compressed `json` file of `edge` and `vertices` type. I have put the files in a dropbox folder [here][1] I load and map these `json` records to create the `vertices` and `edge` types required by `graphx` like this: val vertices_raw =

Re: backward compatibility

2017-01-10 Thread Marco Mistroni

I think old APIs are still supported but u r advised to migrate I migrated few apps from 1.6 to 2.0 with minimal changes Hth On 10 Jan 2017 4:14 pm, "pradeepbill" wrote: > hi there, I am using spark 1.4 code and now we plan to move to spark 2.0, > and > when I check

Re: Could not parse Master URL for Mesos on Spark 2.1.0

2017-01-10 Thread Olivier Girardot

I had the same problem, added spark-mesos as dependency and now I get : [2017-01-10 17:45:16,575] {bash_operator.py:77} INFO - Exception in thread "main" java.lang.NoClassDefFoundError: Could not initialize class org.apache.mesos.MesosSchedulerDriver[2017-01-10 17:45:16,576] {bash_operator.py:77}

backward compatibility

2017-01-10 Thread pradeepbill

hi there, I am using spark 1.4 code and now we plan to move to spark 2.0, and when I check the documentation below, there are only a few features backward compatible, does that mean I have change most of my code , please advice. One of the largest changes in Spark 2.0 is the new updated APIs:

Re: Kryo On Spark 1.6.0

2017-01-10 Thread Yang Cao

If you don’t mind, could please share me with the scala solution? I tried to use kryo but seamed not work at all. I hope to get some practical example. THX > On 2017年1月10日, at 19:10, Enrico DUrso wrote: > > Hi, > > I am trying to use Kryo on Spark 1.6.0. > I am able to

Re: Spark Read from Google store and save in AWS s3

2017-01-10 Thread A Shaikh

This should help https://cloud.google.com/hadoop/examples/bigquery-connector-spark-example On 8 January 2017 at 03:49, neil90 wrote: > Here is how you would read from Google Cloud Storage(note you need to > create > a service account key) -> > >

RE: Kryo On Spark 1.6.0

2017-01-10 Thread Enrico DUrso

Hi, I agree with you Richard. The point is that, looks like some classes which are used internally by Spark are not registered (for instance, the one I mentioned in the previous email is something I am not directly using). For those classes the serialization performance will be poor in

Re: Kryo On Spark 1.6.0

2017-01-10 Thread Richard Startin

Hi Enrico, Only set spark.kryo.registrationRequired if you want to forbid any classes you have not explicitly registered - see http://spark.apache.org/docs/latest/configuration.html. Configuration - Spark 2.0.2 Documentation

Re: Spark ML's RandomForestClassifier OOM

2017-01-10 Thread Julio Antonio Soto de Vicente

No. I am running Spark on YARN on a 3 node testing cluster. My guess is that given the amount of splits done by a hundred trees of depth 30 (which should be more than 100 * 2^30), either the executors or the driver die OOM while trying to store all the split metadata. I guess that the same

Kryo On Spark 1.6.0

2017-01-10 Thread Enrico DUrso

Hi, I am trying to use Kryo on Spark 1.6.0. I am able to register my own classes and it works, but when I set "spark.kryo.registrationRequired " to true, I get an error about a scala class: "Class is not registered: scala.collection.mutable.WrappedArray$ofRef". Any of you has already solved

Re: spark-shell running out of memory even with 6GB ?

2017-01-10 Thread Sean Owen

Maybe ... here are a bunch of things I'd check: Are you running out of memory, or just see a lot of mem usage? JVMs will happily use all the memory you allow them even if some of it could be reclaimed. Did the driver run out of mem? did you give 6G to the driver or executor? OOM errors do show

Re: Spark ML's RandomForestClassifier OOM

2017-01-10 Thread Marco Mistroni

You running locally? Found exactly same issue. 2 solutions: _ reduce datA size. _ run on EMR Hth On 10 Jan 2017 10:07 am, "Julio Antonio Soto" wrote: > Hi, > > I am running into OOM problems while training a Spark ML > RandomForestClassifier (maxDepth of 30, 32 maxBins, 100

Spark ML's RandomForestClassifier OOM

2017-01-10 Thread Julio Antonio Soto

Hi, I am running into OOM problems while training a Spark ML RandomForestClassifier (maxDepth of 30, 32 maxBins, 100 trees). My dataset is arguably pretty big given the executor count and size (8x5G), with approximately 20M rows and 130 features. The "fun fact" is that a single

Spark JDBC Data type mapping (Float and smallInt) Issue

2017-01-10 Thread santlal56

Hi, I am new to spark scala development.I have created job to read data from mysql table using existing data source API(Spark jdbc). I have question regarding data type mapping, these are as follows. *Scenario 1:* I have created table with float type in mysql but while reading through spark jdbc i

Re: how the sparksession initialization, set currentDatabase value?

Spark sql query plan contains all the partitions from hive table even though filtering of partitions is provided

how the sparksession initialization, set currentDatabase value?

large number representation

Spark in docker over EC2

Re: Could not parse Master URL for Mesos on Spark 2.1.0

Re: Nested ifs in sparksql

Re: Could not parse Master URL for Mesos on Spark 2.1.0

Re: Could not parse Master URL for Mesos on Spark 2.1.0

Re: Could not parse Master URL for Mesos on Spark 2.1.0

Re: Dataset Type safety

Library dependencies in Spark

Dataset Type safety

SparkAppHandle.Listener: infoChanged

Re: Nested ifs in sparksql

Nested ifs in sparksql

Re: Could not parse Master URL for Mesos on Spark 2.1.0

Shortest path performance in Graphx with Spark

Re: backward compatibility

Re: Could not parse Master URL for Mesos on Spark 2.1.0

backward compatibility

Re: Kryo On Spark 1.6.0

Re: Spark Read from Google store and save in AWS s3

RE: Kryo On Spark 1.6.0

Re: Kryo On Spark 1.6.0

Re: Spark ML's RandomForestClassifier OOM

Kryo On Spark 1.6.0

Re: spark-shell running out of memory even with 6GB ?

Re: Spark ML's RandomForestClassifier OOM

Spark ML's RandomForestClassifier OOM

Spark JDBC Data type mapping (Float and smallInt) Issue

31 matches

Site Navigation

Mail list logo

Footer information