Re: Time-Series Forecasting

2018-09-19 Thread Aakash Basu
Hey,

Even though I'm more of a Data Engineer than Data Scientist, but still, I
work closely with the DS guys extensively on Spark ML, it is something
which they're still working on following the scikit-learn trend, but, I
never saw Spark handling Time-Series problems. Talking about both
Scala-Spark and PySpark.

So, in short, I think it is yet to be added in the future releases of
Spark, that too, Scala-Spark will get the first release and then they'll
come to other language APIs in future minor releases as per need, usage and
importance.

Best,
AB.

On Thu 20 Sep, 2018, 4:43 AM ayan guha,  wrote:

> Hi
>
> I work mostly in data engineering and trying to promote use of sparkR
> within the company I recently joined. Some of the users are working around
> forecasting a bunch of things and want to use SparklyR as they found time
> series implementation is better than SparkR.
>
> Does anyone have a point of view regarding this? Is SparklyR is better
> than SparkR in certain use cases?
>
> On Thu, Sep 20, 2018 at 4:07 AM, Mina Aslani  wrote:
>
>> Hi,
>>
>> Thank you for your quick response, really appreciate it.
>>
>> I just started learning TimeSeries forecasting, and I may try different
>> methods and observe their predictions/forecasting.However, my
>> understanding is that below methods are needed:
>>
>> - Smoothing
>> - Decomposing(e.g. remove/separate trend/seasonality)
>> - AR Model/MA Model/Combined Model (e.g. ARMA, ARIMA)
>> - ACF (Autocorrelation Function)/PACF (Partial Autocorrelation Function)
>> - Recurrent Neural Network (LSTM: Long Short Term Memory)
>>
>> Kindest regards,
>> Mina
>>
>>
>>
>> On Wed, Sep 19, 2018 at 12:55 PM Jörn Franke 
>> wrote:
>>
>>> What functionality do you need ? Ie which methods?
>>>
>>> > On 19. Sep 2018, at 18:01, Mina Aslani  wrote:
>>> >
>>> > Hi,
>>> > I have a question for you. Do we have any Time-Series Forecasting
>>> library in Spark?
>>> >
>>> > Best regards,
>>> > Mina
>>>
>>
>
>
> --
> Best Regards,
> Ayan Guha
>


Re: Time-Series Forecasting

2018-09-19 Thread ayan guha
Hi

I work mostly in data engineering and trying to promote use of sparkR
within the company I recently joined. Some of the users are working around
forecasting a bunch of things and want to use SparklyR as they found time
series implementation is better than SparkR.

Does anyone have a point of view regarding this? Is SparklyR is better than
SparkR in certain use cases?

On Thu, Sep 20, 2018 at 4:07 AM, Mina Aslani  wrote:

> Hi,
>
> Thank you for your quick response, really appreciate it.
>
> I just started learning TimeSeries forecasting, and I may try different
> methods and observe their predictions/forecasting.However, my
> understanding is that below methods are needed:
>
> - Smoothing
> - Decomposing(e.g. remove/separate trend/seasonality)
> - AR Model/MA Model/Combined Model (e.g. ARMA, ARIMA)
> - ACF (Autocorrelation Function)/PACF (Partial Autocorrelation Function)
> - Recurrent Neural Network (LSTM: Long Short Term Memory)
>
> Kindest regards,
> Mina
>
>
>
> On Wed, Sep 19, 2018 at 12:55 PM Jörn Franke  wrote:
>
>> What functionality do you need ? Ie which methods?
>>
>> > On 19. Sep 2018, at 18:01, Mina Aslani  wrote:
>> >
>> > Hi,
>> > I have a question for you. Do we have any Time-Series Forecasting
>> library in Spark?
>> >
>> > Best regards,
>> > Mina
>>
>


-- 
Best Regards,
Ayan Guha


Re: DirectFileOutputCommitter in Spark 2.3.1

2018-09-19 Thread Dillon Dukek
I believe you need to set mapreduce.fileoutputcommitter.algorithm.version
to 2.

On Wed, Sep 19, 2018 at 10:45 AM Priya Ch 
wrote:

> Hello Team,
>
> I am trying to write a DataSet as parquet file in Append mode partitioned
> by few columns. However since the job is time consuming, I would like to
> enable DirectFileOutputCommitter (i.e by-passing the writes to temporary
> folder).
>
> Version of the spark i am using is 2.3.1.
>
> Can someone please help in enabling the configuration which allows direct
> write to S3 both in case of appending, writing new files and overwriting
> the files.
>
> Thanks,
> Padma CH
>


Re: Encoder for JValue

2018-09-19 Thread Arko Provo Mukherjee
Hello Muthu,

Many thanks for your reply. That is what we are currently doing.

However, we finally load the data somewhere and we need to have JSON
objects rather than serialized strings.

Hence I was wondering if there are encoders our there for JObject and if I
can somehow pass that information to Spark.

Thanks & regards
Arko


On Tue, Sep 18, 2018 at 11:39 PM Muthu Jayakumar  wrote:

> A naive workaround may be to transform the json4s JValue to String (using
> something like compact()) and process it as String? Once you are done with
> the last action, you could write it back as JValue (using something like
> parse())
>
> Thanks,
> Muthu
>
> On Wed, Sep 19, 2018 at 6:35 AM Arko Provo Mukherjee <
> arkoprovomukher...@gmail.com> wrote:
>
>> Hello Spark Gurus,
>>
>> I am running into an issue with Encoding and wanted your help.
>>
>> I have a case class with a JObject in it. Ex:
>> *case class SomeClass(a: String, b: JObject)*
>>
>> I also have an encoder for this case class:
>> *val encoder = Encoders.product[**SomeClass**]*
>>
>> Now I am creating a DataFrame with the tuple (a, b) from my
>> transformations and converting into a DataSet:
>> *df.as [SomeClass](encoder)*
>>
>> When I do this, I get the following error:
>> *java.lang.UnsupportedOperationException: No Encoder found for
>> org.json4s.JsonAST.JValue*
>>
>> Appreciate any help regarding this issue.
>>
>> Many thanks in advance!
>> Warm regards
>> Arko
>>
>>
>>


Re: Time-Series Forecasting

2018-09-19 Thread Mina Aslani
Hi,

Thank you for your quick response, really appreciate it.

I just started learning TimeSeries forecasting, and I may try different
methods and observe their predictions/forecasting.However, my understanding
is that below methods are needed:

- Smoothing
- Decomposing(e.g. remove/separate trend/seasonality)
- AR Model/MA Model/Combined Model (e.g. ARMA, ARIMA)
- ACF (Autocorrelation Function)/PACF (Partial Autocorrelation Function)
- Recurrent Neural Network (LSTM: Long Short Term Memory)

Kindest regards,
Mina



On Wed, Sep 19, 2018 at 12:55 PM Jörn Franke  wrote:

> What functionality do you need ? Ie which methods?
>
> > On 19. Sep 2018, at 18:01, Mina Aslani  wrote:
> >
> > Hi,
> > I have a question for you. Do we have any Time-Series Forecasting
> library in Spark?
> >
> > Best regards,
> > Mina
>


DirectFileOutputCommitter in Spark 2.3.1

2018-09-19 Thread Priya Ch
Hello Team,

I am trying to write a DataSet as parquet file in Append mode partitioned
by few columns. However since the job is time consuming, I would like to
enable DirectFileOutputCommitter (i.e by-passing the writes to temporary
folder).

Version of the spark i am using is 2.3.1.

Can someone please help in enabling the configuration which allows direct
write to S3 both in case of appending, writing new files and overwriting
the files.

Thanks,
Padma CH


Re: Time-Series Forecasting

2018-09-19 Thread chris
There’s also flint: https://github.com/twosigma/flint

> On 19 Sep 2018, at 17:55, Jörn Franke  wrote:
> 
> What functionality do you need ? Ie which methods?
> 
>> On 19. Sep 2018, at 18:01, Mina Aslani  wrote:
>> 
>> Hi,
>> I have a question for you. Do we have any Time-Series Forecasting library in 
>> Spark? 
>> 
>> Best regards,
>> Mina
> 
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> 


Re: Time-Series Forecasting

2018-09-19 Thread Jörn Franke
What functionality do you need ? Ie which methods?

> On 19. Sep 2018, at 18:01, Mina Aslani  wrote:
> 
> Hi,
> I have a question for you. Do we have any Time-Series Forecasting library in 
> Spark? 
> 
> Best regards,
> Mina

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Time-Series Forecasting

2018-09-19 Thread Mina Aslani
Hi,
I saw spark-ts , however, looks
like it's not under active development any more. I really appreciate to get
your insight.

Kindest regards,
Mina

On Wed, Sep 19, 2018 at 12:01 PM Mina Aslani  wrote:

> Hi,
> I have a question for you. Do we have any Time-Series Forecasting library
> in Spark?
>
> Best regards,
> Mina
>


Time-Series Forecasting

2018-09-19 Thread Mina Aslani
Hi,
I have a question for you. Do we have any Time-Series Forecasting library
in Spark?

Best regards,
Mina


Re: Encoder for JValue

2018-09-19 Thread Muthu Jayakumar
A naive workaround may be to transform the json4s JValue to String (using
something like compact()) and process it as String? Once you are done with
the last action, you could write it back as JValue (using something like
parse())

Thanks,
Muthu

On Wed, Sep 19, 2018 at 6:35 AM Arko Provo Mukherjee <
arkoprovomukher...@gmail.com> wrote:

> Hello Spark Gurus,
>
> I am running into an issue with Encoding and wanted your help.
>
> I have a case class with a JObject in it. Ex:
> *case class SomeClass(a: String, b: JObject)*
>
> I also have an encoder for this case class:
> *val encoder = Encoders.product[**SomeClass**]*
>
> Now I am creating a DataFrame with the tuple (a, b) from my
> transformations and converting into a DataSet:
> *df.as [SomeClass](encoder)*
>
> When I do this, I get the following error:
> *java.lang.UnsupportedOperationException: No Encoder found for
> org.json4s.JsonAST.JValue*
>
> Appreciate any help regarding this issue.
>
> Many thanks in advance!
> Warm regards
> Arko
>
>
>