How to Fill Sparse Data With the Previous Non-Empty Value in SPARQL Dataset

2017-06-28 Thread Carlo Allocca
Dear All, I am trying to propagate the last valid observation (e.g. not null) to the null values in a dataset. Below I reported the partial solution: Dataset tmp800=tmp700.select("uuid", "eventTime", "Washer_rinseCycles"); WindowSpec wspec=

How to Fill Sparse Data With the Previous Non-Empty Value in SPARQL Dataset

2017-06-25 Thread Carlo . Allocca
Dear All, I need to apply a dataset transformation to replace null values with the previous Non-null Value. As an example, I report the following: from: id | col1 - 1 null 1 null 2 4 2 null 2 null 3 5 3 null 3 null to: id | col1 - 1 null 1 null 2 4 2

Re: using spark-xml_2.10 to extract data from XML file

2017-02-15 Thread Carlo . Allocca
bstract.ce:para")).show(); I got null values. My question is: How Can I get it right to use String rowTag="xocs:doc”; and get the right values for ….abstract.ce:para, etc? what am I doing wrong? Many Thanks in advance. Best Regards, Carlo On 14 Feb 2017, at 17:35, carlo allo

Re: using spark-xml_2.10 to extract data from XML file

2017-02-14 Thread Carlo . Allocca
String rowTag="xocs:doc”; and get the right values for ….abstract.ce:para, etc? what am I doing wrong? Many Thanks in advance. Best Regards, Carlo On 14 Feb 2017, at 17:35, carlo allocca <ca6...@open.ac.uk<mailto:ca6...@open.ac.uk>> wrote: Dear All, I would

Re: using spark-xml_2.10 to extract data from XML file

2017-02-14 Thread Carlo . Allocca
Dear All, I would like to ask you help about the following issue when using spark-xml_2.10: Given a XML file with the following structure: xocs:doc |-- xocs:item: struct (nullable = true) ||-- bibrecord: struct (nullable = true) |||-- head: struct (nullable = true) |||

using spark-xml_2.10 to extract data from XML file

2017-02-13 Thread Carlo . Allocca
Dear All, I am using spark-xml_2.10 to parse and extract some data from XML files. I got the issue of getting null value whereas the XML file contains actually values.

Re: LinearRegressionWithSGD and Rank Features By Importance

2016-11-09 Thread Carlo . Allocca
Hi Masood, Thanks for the answer. Sure. I will do as suggested. Many Thanks, Best Regards, Carlo On 8 Nov 2016, at 17:19, Masood Krohy > wrote: labels -- The Open University is incorporated by Royal Charter (RC 000391), an exempt

Re: LinearRegressionWithSGD and Rank Features By Importance

2016-11-08 Thread Carlo . Allocca
hanks in advance. Best Regards, Carlo On 7 Nov 2016, at 17:14, carlo allocca <ca6...@open.ac.uk<mailto:ca6...@open.ac.uk>> wrote: I found it just google http://sebastianraschka.com/Articles/2014_about_feature_scaling.html Thanks. Carlo On 7 Nov 2016, at 17:12, carlo allocca &

Re: LinearRegressionWithSGD and Rank Features By Importance

2016-11-07 Thread Carlo . Allocca
I found it just google http://sebastianraschka.com/Articles/2014_about_feature_scaling.html Thanks. Carlo On 7 Nov 2016, at 17:12, carlo allocca <ca6...@open.ac.uk<mailto:ca6...@open.ac.uk>> wrote: Hi Masood, Thank you very much for your insight. I am going to scale all my fea

Re: LinearRegressionWithSGD and Rank Features By Importance

2016-11-07 Thread Carlo . Allocca
Hi Masood, Thank you very much for your insight. I am going to scale all my features as you described. As I am beginners, Is there any paper/book that would explain the suggested approaches? I would love to read. Many Thanks, Best Regards, Carlo On 7 Nov 2016, at 16:27, Masood Krohy

Re: LinearRegressionWithSGD and Rank Features By Importance

2016-11-04 Thread Carlo . Allocca
Hi Robin, On 4 Nov 2016, at 09:19, Robin East > wrote: Hi Do you mean the test of significance that you usually get with R output? Yes, exactly. I don’t think there is anything implemented in the standard MLLib libraries however I believe

Re: LinearRegressionWithSGD and Rank Features By Importance

2016-11-04 Thread Carlo . Allocca
Hi Mohit, Thank you for your reply. OK. it means coefficient with high score are more important that other with low score… Many Thanks, Best Regards, Carlo > On 3 Nov 2016, at 20:41, Mohit Jaggi wrote: > > For linear regression, it should be fairly easy. Just sort

LinearRegressionWithSGD and Rank Features By Importance

2016-11-03 Thread Carlo . Allocca
Hi All, I am using SPARK and in particular the MLib library. import org.apache.spark.mllib.regression.LabeledPoint; import org.apache.spark.mllib.regression.LinearRegressionModel; import org.apache.spark.mllib.regression.LinearRegressionWithSGD; For my problem I am using the

Re: ClassNotFoundException org.apache.spark.Logging

2016-08-05 Thread Carlo . Allocca
Thanks Marcelo. Problem solved. Best, Carlo Hi Marcelo, Thanks you for your help. Problem solved as you suggested. Best Regards, Carlo > On 5 Aug 2016, at 18:34, Marcelo Vanzin wrote: > > On Fri, Aug 5, 2016 at 9:53 AM, Carlo.Allocca > wrote:

Re: ClassNotFoundException org.apache.spark.Logging

2016-08-05 Thread Carlo . Allocca
I have also executed: mvn dependency:tree |grep log [INFO] | | +- com.esotericsoftware:minlog:jar:1.3.0:compile [INFO] +- log4j:log4j:jar:1.2.17:compile [INFO] +- org.slf4j:slf4j-log4j12:jar:1.7.16:compile [INFO] | | +- commons-logging:commons-logging:jar:1.1.3:compile and the POM

Re: ClassNotFoundException org.apache.spark.Logging

2016-08-05 Thread Carlo . Allocca
Please Sean, could you detail the version mismatch? Many thanks, Carlo On 5 Aug 2016, at 18:11, Sean Owen > wrote: You also seem to have a version mismatch here. -- The Open University is incorporated by Royal Charter (RC 000391), an exempt

Re: ClassNotFoundException org.apache.spark.Logging

2016-08-05 Thread Carlo . Allocca
Hi Ted, Thanks for the promptly answer. It is not yet clear to me what I should do. How to fix it? Many thanks, Carlo On 5 Aug 2016, at 17:58, Ted Yu > wrote: private[spark] trait Logging { -- The Open University is incorporated by Royal

ClassNotFoundException org.apache.spark.Logging

2016-08-05 Thread Carlo . Allocca
Dear All, I would like to ask for your help about the following issue: java.lang.ClassNotFoundException: org.apache.spark.Logging I checked and the class Logging is not present. Moreover, the line of code where the exception is thrown final

Re: Dataset and JavaRDD: how to eliminate the header.

2016-08-03 Thread Carlo . Allocca
On 3 Aug 2016, at 22:01, Mich Talebzadeh > wrote: ok in other words the result set of joining two dataset ends up with inconsistent result as a header from one DS is joined with another row from another DS? I am not 100% sure I got

Re: Dataset and JavaRDD: how to eliminate the header.

2016-08-03 Thread Carlo . Allocca
Hi Mich, Thanks again. My issue is not when I read the csv from a file. It is when you have a Dataset that is output of some join operations. Any help on that? Many Thanks, Best, Carlo On 3 Aug 2016, at 21:43, Mich Talebzadeh >

Re: Dataset and JavaRDD: how to eliminate the header.

2016-08-03 Thread Carlo . Allocca
One more: it seems that the steps == Step 1: transform the Dataset into JavaRDD JavaRDD dataPointsWithHeader =dataset1_Join_dataset2.toJavaRDD(); and List someRows = dataPointsWithHeader.collect(); someRows.forEach(System.out::println); do not print the header. So, Could I

Re: converting a Dataset into JavaRDD

2016-08-03 Thread Carlo . Allocca
problem solved. The package org.apache.spark.api.java.function.Function was missing. Thanks. Carlo On 3 Aug 2016, at 12:14, Carlo.Allocca > wrote: Hi All, I am trying to convert a Dataset into JavaRDD in order to apply a linear

converting a Dataset into JavaRDD

2016-08-03 Thread Carlo . Allocca
Hi All, I am trying to convert a Dataset into JavaRDD in order to apply a linear regression. I am using spark-core_2.10, version2.0.0 with Java 1.8. My current approach is: == Step 1: convert the Dataset into JavaRDD JavaRDD dataPoints =modelDS.toJavaRDD(); == Step 2: convert JavaRDD

Re: SPARK Exception thrown in awaitResult

2016-07-28 Thread Carlo . Allocca
Solved!! The solution is using date_format with the “u” option. Thank you very much. Best, Carlo On 28 Jul 2016, at 18:59, carlo allocca <ca6...@open.ac.uk<mailto:ca6...@open.ac.uk>> wrote: Hi Mark, Thanks for the suggestion. I changed the maven entries as follows spa

Re: SPARK Exception thrown in awaitResult

2016-07-28 Thread Carlo . Allocca
following two related links: 1) https://github.com/apache/spark/commit/947b9020b0d621bc97661a0a056297e6889936d3 2) https://github.com/apache/spark/pull/12433 which both explain why it happens but nothing about what to do to solve it. Do you have any suggestion/recommendation? Many thanks. Carlo

Re: SPARK Exception thrown in awaitResult

2016-07-28 Thread Carlo . Allocca
spark/commit/947b9020b0d621bc97661a0a056297e6889936d3 2) https://github.com/apache/spark/pull/12433 which both explain why it happens but nothing about what to do to solve it. Do you have any suggestion/recommendation? Many thanks. Carlo On 28 Jul 2016, at 11:06, carlo allocca <ca6...@open.a

Re: SPARK Exception thrown in awaitResult

2016-07-28 Thread Carlo . Allocca
/recommendation? Many thanks. Carlo On 28 Jul 2016, at 11:06, carlo allocca <ca6...@open.ac.uk<mailto:ca6...@open.ac.uk>> wrote: Hi Rui, Thanks for the promptly reply. No, I am not using Mesos. Ok. I am writing a code to build a suitable dataset for my needs as in the following: == Session c

Re: SPARK Exception thrown in awaitResult

2016-07-28 Thread Carlo . Allocca
Hi Rui, Thanks for the promptly reply. No, I am not using Mesos. Ok. I am writing a code to build a suitable dataset for my needs as in the following: == Session configuration: SparkSession spark = SparkSession .builder() .master("local[6]") //

SPARK Exception thrown in awaitResult

2016-07-28 Thread Carlo . Allocca
Hi All, I am running SPARK locally, and when running d3=join(d1,d2) and d5=(d3, d4) am getting the following exception "org.apache.spark.SparkException: Exception thrown in awaitResult”. Googling for it, I found that the closed is the answer reported

SPARK UDF related issue

2016-07-25 Thread Carlo . Allocca
Hi All, I am using SPARK 2.0 and I have got the following issue: I am able to run the step 1-5 (see below) but not the step 6 which uses an UDF. Actually, the step 1-5 takes few second and the step 6 looks like that it never ends. Is there anything wrong? how should I address it? Any

SPARK SQL and join pipeline issue

2016-07-25 Thread Carlo . Allocca
Dear All, I have the following question: I am using SPARK SQL 2.0 version and, in particular I am doing some joins in pipeline of the following pattern (d3 = d1 join d2, d4=d5 join d6, d7=d3 join d4). When running my code, I realised that the building of d7 generates an issue as reported