Re: Check if dataframe is empty

2017-03-06 Thread Deepak Sharma
If the df is empty , the .take would return java.util.NoSuchElementException. This can be done as below: df.rdd.isEmpty On Tue, Mar 7, 2017 at 9:33 AM, wrote: > Dataframe.take(1) is faster. > > > > *From:* ashaita...@nz.imshealth.com

RE: Check if dataframe is empty

2017-03-06 Thread AShaitarov
Thank you for the prompt response. But why is it faster? There is an implementation of isEmpty for rdd: def isEmpty(): Boolean = withScope { partitions.length == 0 || take(1).length == 0 } Basically, the same take(1). Is it because of limit? Regards, Artem Shaitarov From:

RE: Check if dataframe is empty

2017-03-06 Thread jasbir.sing
Dataframe.take(1) is faster. From: ashaita...@nz.imshealth.com [mailto:ashaita...@nz.imshealth.com] Sent: Tuesday, March 07, 2017 9:22 AM To: user@spark.apache.org Subject: Check if dataframe is empty Hello! I am pretty sure that I am asking something which has been already asked lots of

Check if dataframe is empty

2017-03-06 Thread AShaitarov
Hello! I am pretty sure that I am asking something which has been already asked lots of times. However, I cannot find the question in the mailing list archive. The question is - I need to check whether dataframe is empty or not. I receive a dataframe from 3rd party library and this dataframe

How does Spark provide Hive style bucketing support?

2017-03-06 Thread SRK
Hi, How does Spark provide Hive style bucketing support in Spark 2.x? Thanks, Swetha -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-does-Spark-provide-Hive-style-bucketing-support-tp28462.html Sent from the Apache Spark User List mailing list archive

Re: FPGrowth Model is taking too long to generate frequent item sets

2017-03-06 Thread Raju Bairishetti
@Eli, Thanks for the suggestion. If you do not mind can you please elaborate approaches? On Mon, Mar 6, 2017 at 7:29 PM, Eli Super wrote: > Hi > > Try to implement binning and/or feature engineering (smart feature > selection for example) > > Good luck > > On Mon, Mar 6,

Trouble with Thriftserver with hsqldb (Spark 2.1.0)

2017-03-06 Thread Yana Kadiyska
Hi folks, trying to run Spark 2.1.0 thrift server against an hsqldb file and it seems to...hang. I am starting thrift server with: sbin/start-thriftserver.sh --driver-class-path ./conf/hsqldb-2.3.4.jar , completely local setup hive-site.xml is like this:

Re: org.apache.spark.SparkException: Task not serializable

2017-03-06 Thread Mina Aslani
Thank you Ankur for the quick response, really appreciate it! Making the class serializable resolved the exception! Best regards, Mina On Mon, Mar 6, 2017 at 4:20 PM, Ankur Srivastava wrote: > The fix for this make your class Serializable. The reason being the >

Re: org.apache.spark.SparkException: Task not serializable

2017-03-06 Thread Ankur Srivastava
The fix for this make your class Serializable. The reason being the closures you have defined in the class need to be serialized and copied over to all executor nodes. Hope this helps. Thanks Ankur On Mon, Mar 6, 2017 at 1:06 PM, Mina Aslani wrote: > Hi, > > I am trying

org.apache.spark.SparkException: Task not serializable

2017-03-06 Thread Mina Aslani
Hi, I am trying to start with spark and get number of lines of a text file in my mac, however I get org.apache.spark.SparkException: Task not serializable error on JavaRDD logData = javaCtx.textFile(file); Please see below for the sample of code and the stackTrace. Any idea why this error is

Fwd: Spark application does not work with only one core

2017-03-06 Thread Maximilien Belinga
I am currently working to deploy two spark applications and I want to restrict cores and executors per application. My config is as follows: spark.executor.cores=1 spark.driver.cores=1 spark.cores.max=1 spark.executor.instances=1 Now the issue is that with this exact configuration, one

Spark application does not work with only one core

2017-03-06 Thread Maximilien Belinga
I am currently working to deploy two spark applications and I want to restrict cores and executors per application. My config is as follows: spark.executor.cores=1 spark.driver.cores=1 spark.cores.max=1 spark.executor.instances=1 Now the issue is that with this exact configuration, one

Re: Wrong runtime type when using newAPIHadoopFile in Java

2017-03-06 Thread Steve Loughran
On 6 Mar 2017, at 12:30, Nira Amit > wrote: And it's very difficult if it's doing unexpected things. All serialisations do unexpected things. Nobody understands them. Sorry

Re: Wrong runtime type when using newAPIHadoopFile in Java

2017-03-06 Thread Nira Amit
And by the way - I don't want the Avro details to be hidden away from me. The whole purpose of the work I'm doing is to benchmark different serialization tools and strategies. If I want to use Kryo serialization for example, then I need to understand how the API works. And it's very difficult if

Re: Wrong runtime type when using newAPIHadoopFile in Java

2017-03-06 Thread Nira Amit
Hi Sean, Yes, we discussed this in Jira and you suggested I take this discussion to the mailing list, so I did. I don't have the option to migrate the code I'm working on to Datasets at the moment (or to Scala, as another developer suggested in the Jira discussion), so I have to work with the the

Re: Wrong runtime type when using newAPIHadoopFile in Java

2017-03-06 Thread Sean Owen
I think this is the same thing we already discussed extensively on your JIRA. The type of the key/value class argument to newAPIHadoopFile are not the type of your custom class, but of the Writable describing encoding of keys and values in the file. I think that's the start of part of the

Wrong runtime type when using newAPIHadoopFile in Java

2017-03-06 Thread Nira
I tried to load a custom type from avro files into a RDD using the newAPIHadoopFile. I started with the following naive code: JavaPairRDD events = sc.newAPIHadoopFile("file:/path/to/data.avro", AvroKeyInputFormat.class,

Re: FPGrowth Model is taking too long to generate frequent item sets

2017-03-06 Thread Eli Super
Hi Try to implement binning and/or feature engineering (smart feature selection for example) Good luck On Mon, Mar 6, 2017 at 6:56 AM, Raju Bairishetti wrote: > Hi, > I am new to Spark ML Lib. I am using FPGrowth model for finding related > items. > > Number of transactions

Re: LinearRegressionModel - Negative Predicted Value

2017-03-06 Thread Manish Maheshwari
Thanks Sean. Our training MSE is really large. We definitely need better predictor variables. Training Mean Squared Error = 7.72E8 Thanks, Manish On Mon, Mar 6, 2017 at 4:45 PM, Sean Owen wrote: > There's nothing unusual about negative values from a linear regression. >

Re: LinearRegressionModel - Negative Predicted Value

2017-03-06 Thread Sean Owen
There's nothing unusual about negative values from a linear regression. If, generally, your predicted values are far from your actual values, then your model hasn't fit well. You may have a bug somewhere in your pipeline or you may have data without much linear relationship. Most of this isn't a

LinearRegressionModel - Negative Predicted Value

2017-03-06 Thread Manish Maheshwari
Hi All, We are using a LinearRegressionModel in Scala. We are using a standard StandardScaler to normalize the data before modelling.. the Code snippet looks like this - *Modellng - * val labeledPointsRDD = tableRecords.map(row => { val filtered = row.toSeq.filter({ case s: String => false case