Re: [Pyspark 2.3+] Timeseries with Spark

2019-06-13 Thread Jörn Franke
Time series can mean a lot of different things and algorithms. Can you describe 
more what you mean by time series use case, ie what is the input, what do you 
like to do with the input and what is the output?

> Am 14.06.2019 um 06:01 schrieb Rishi Shah :
> 
> Hi All,
> 
> I have a time series use case which I would like to implement in Spark... 
> What would be the best way to do so? Any built in libraries?
> 
> -- 
> Regards,
> 
> Rishi Shah

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



[Pyspark 2.3+] Timeseries with Spark

2019-06-13 Thread Rishi Shah
Hi All,

I have a time series use case which I would like to implement in Spark...
What would be the best way to do so? Any built in libraries?

-- 
Regards,

Rishi Shah


Spark on Yarn - Dynamically getting a list of archives from --archives in spark-submit

2019-06-13 Thread Tommy Li
Hi

Is there any way to get a list of the archives submitted with a spark job from 
the spark context?
I see that spark context has a `.files()` function which returns the files 
included with `--files`, but I don't see an equivalent for `--archives`.


Thanks,
Tommy


Re: [Spark Core]: What is the release date for Spark 3 ?

2019-06-13 Thread Vadim Semenov
next spark summit

On Thu, Jun 13, 2019 at 3:58 AM Alex Dettinger 
wrote:

> Follow up on the release date for Spark 3. Any guesstimate or rough
> estimation without commitment would be helpful :)
>
> Cheers,
> Alex
>
> On Mon, Jun 10, 2019 at 5:24 PM Alex Dettinger 
> wrote:
>
>> Hi guys,
>>
>>   I was not able to find the foreseen release date for Spark 3.
>>   Would one have any information on this please ?
>>
>> Many thanks,
>> Alex
>>
>

-- 
Sent from my iPhone


Re: Exposing JIRA issue types at GitHub PRs

2019-06-13 Thread Dongjoon Hyun
Thank you for the feedbacks and requirements, Hyukjin, Reynold, Marco.

Sure, we can do whatever we want.

I'll wait for more feedbacks and proceed to the next steps.

Bests,
Dongjoon.


On Wed, Jun 12, 2019 at 11:51 PM Marco Gaido  wrote:

> Hi Dongjoon,
> Thanks for the proposal! I like the idea. Maybe we can extend it to
> component too and to some jira labels such as correctness which may be
> worth to highlight in PRs too. My only concern is that in many cases JIRAs
> are created not very carefully so they may be incorrect at the moment of
> the pr creation and it may be updated later: so keeping them in sync may be
> an extra effort..
>
> On Thu, 13 Jun 2019, 08:09 Reynold Xin,  wrote:
>
>> Seems like a good idea. Can we test this with a component first?
>>
>> On Thu, Jun 13, 2019 at 6:17 AM Dongjoon Hyun 
>> wrote:
>>
>>> Hi, All.
>>>
>>> Since we use both Apache JIRA and GitHub actively for Apache Spark
>>> contributions, we have lots of JIRAs and PRs consequently. One specific
>>> thing I've been longing to see is `Jira Issue Type` in GitHub.
>>>
>>> How about exposing JIRA issue types at GitHub PRs as GitHub `Labels`?
>>> There are two main benefits:
>>> 1. It helps the communication between the contributors and reviewers
>>> with more information.
>>> (In some cases, some people only visit GitHub to see the PR and
>>> commits)
>>> 2. `Labels` is searchable. We don't need to visit Apache Jira to search
>>> PRs to see a specific type.
>>> (For example, the reviewers can see and review 'BUG' PRs first by
>>> using `is:open is:pr label:BUG`.)
>>>
>>> Of course, this can be done automatically without human intervention.
>>> Since we already have GitHub Jenkins job to access JIRA/GitHub, that job
>>> can add the labels from the beginning. If needed, I can volunteer to update
>>> the script.
>>>
>>> To show the demo, I labeled several PRs manually. You can see the result
>>> right now in Apache Spark PR page.
>>>
>>>   - https://github.com/apache/spark/pulls
>>>
>>> If you're surprised due to those manual activities, I want to apologize
>>> for that. I hope we can take advantage of the existing GitHub features to
>>> serve Apache Spark community in a way better than yesterday.
>>>
>>> How do you think about this specific suggestion?
>>>
>>> Bests,
>>> Dongjoon
>>>
>>> PS. I saw that `Request Review` and `Assign` features are already used
>>> for some purposes, but these feature are out of the scope in this email.
>>>
>>


Re: best docker image to use

2019-06-13 Thread Marcelo Valle
Thanks Riccardo. This is useful, and it seems it's maintained by jupyter
team.
I was hoping I would find some maintained by spark team.

Right now, I am using the base images from this repo:
https://github.com/big-data-europe/docker-spark/

-Marcelo

On Tue, 11 Jun 2019 at 12:19, Riccardo Ferrari  wrote:

> Hi Marcelo,
>
> I'm used to work with https://github.com/jupyter/docker-stacks. There's
> the Scala+jupyter option too. Though there might be better option with
> Zeppelin too.
> Hth
>
>
> On Tue, 11 Jun 2019, 11:52 Marcelo Valle,  wrote:
>
>> Hi,
>>
>> I would like to run spark shell + scala on a docker environment, just to
>> play with docker in development machine without having to install JVM + a
>> lot of things.
>>
>> Is there something as an "official docker image" I am recommended to use?
>> I saw some on docker hub, but it seems they are all contributions from
>> pro-active individuals. I wonder whether the group maintaining Apache Spark
>> also maintains some docker images for use cases like this?
>>
>> Thanks,
>> Marcelo.
>>
>> This email is confidential [and may be protected by legal privilege]. If
>> you are not the intended recipient, please do not copy or disclose its
>> content but contact the sender immediately upon receipt.
>>
>> KTech Services Ltd is registered in England as company number 10704940.
>>
>> Registered Office: The River Building, 1 Cousin Lane, London EC4R 3TE,
>> United Kingdom
>>
>

This email is confidential [and may be protected by legal privilege]. If you 
are not the intended recipient, please do not copy or disclose its content but 
contact the sender immediately upon receipt.

KTech Services Ltd is registered in England as company number 10704940.

Registered Office: The River Building, 1 Cousin Lane, London EC4R 3TE, United 
Kingdom


Re: [Spark Core]: What is the release date for Spark 3 ?

2019-06-13 Thread Alex Dettinger
Follow up on the release date for Spark 3. Any guesstimate or rough
estimation without commitment would be helpful :)

Cheers,
Alex

On Mon, Jun 10, 2019 at 5:24 PM Alex Dettinger 
wrote:

> Hi guys,
>
>   I was not able to find the foreseen release date for Spark 3.
>   Would one have any information on this please ?
>
> Many thanks,
> Alex
>


Re: Exposing JIRA issue types at GitHub PRs

2019-06-13 Thread Marco Gaido
Hi Dongjoon,
Thanks for the proposal! I like the idea. Maybe we can extend it to
component too and to some jira labels such as correctness which may be
worth to highlight in PRs too. My only concern is that in many cases JIRAs
are created not very carefully so they may be incorrect at the moment of
the pr creation and it may be updated later: so keeping them in sync may be
an extra effort..

On Thu, 13 Jun 2019, 08:09 Reynold Xin,  wrote:

> Seems like a good idea. Can we test this with a component first?
>
> On Thu, Jun 13, 2019 at 6:17 AM Dongjoon Hyun 
> wrote:
>
>> Hi, All.
>>
>> Since we use both Apache JIRA and GitHub actively for Apache Spark
>> contributions, we have lots of JIRAs and PRs consequently. One specific
>> thing I've been longing to see is `Jira Issue Type` in GitHub.
>>
>> How about exposing JIRA issue types at GitHub PRs as GitHub `Labels`?
>> There are two main benefits:
>> 1. It helps the communication between the contributors and reviewers with
>> more information.
>> (In some cases, some people only visit GitHub to see the PR and
>> commits)
>> 2. `Labels` is searchable. We don't need to visit Apache Jira to search
>> PRs to see a specific type.
>> (For example, the reviewers can see and review 'BUG' PRs first by
>> using `is:open is:pr label:BUG`.)
>>
>> Of course, this can be done automatically without human intervention.
>> Since we already have GitHub Jenkins job to access JIRA/GitHub, that job
>> can add the labels from the beginning. If needed, I can volunteer to update
>> the script.
>>
>> To show the demo, I labeled several PRs manually. You can see the result
>> right now in Apache Spark PR page.
>>
>>   - https://github.com/apache/spark/pulls
>>
>> If you're surprised due to those manual activities, I want to apologize
>> for that. I hope we can take advantage of the existing GitHub features to
>> serve Apache Spark community in a way better than yesterday.
>>
>> How do you think about this specific suggestion?
>>
>> Bests,
>> Dongjoon
>>
>> PS. I saw that `Request Review` and `Assign` features are already used
>> for some purposes, but these feature are out of the scope in this email.
>>
>


Re: Exposing JIRA issue types at GitHub PRs

2019-06-13 Thread Reynold Xin
Seems like a good idea. Can we test this with a component first?

On Thu, Jun 13, 2019 at 6:17 AM Dongjoon Hyun 
wrote:

> Hi, All.
>
> Since we use both Apache JIRA and GitHub actively for Apache Spark
> contributions, we have lots of JIRAs and PRs consequently. One specific
> thing I've been longing to see is `Jira Issue Type` in GitHub.
>
> How about exposing JIRA issue types at GitHub PRs as GitHub `Labels`?
> There are two main benefits:
> 1. It helps the communication between the contributors and reviewers with
> more information.
> (In some cases, some people only visit GitHub to see the PR and
> commits)
> 2. `Labels` is searchable. We don't need to visit Apache Jira to search
> PRs to see a specific type.
> (For example, the reviewers can see and review 'BUG' PRs first by
> using `is:open is:pr label:BUG`.)
>
> Of course, this can be done automatically without human intervention.
> Since we already have GitHub Jenkins job to access JIRA/GitHub, that job
> can add the labels from the beginning. If needed, I can volunteer to update
> the script.
>
> To show the demo, I labeled several PRs manually. You can see the result
> right now in Apache Spark PR page.
>
>   - https://github.com/apache/spark/pulls
>
> If you're surprised due to those manual activities, I want to apologize
> for that. I hope we can take advantage of the existing GitHub features to
> serve Apache Spark community in a way better than yesterday.
>
> How do you think about this specific suggestion?
>
> Bests,
> Dongjoon
>
> PS. I saw that `Request Review` and `Assign` features are already used for
> some purposes, but these feature are out of the scope in this email.
>


Re: Exposing JIRA issue types at GitHub PRs

2019-06-13 Thread Hyukjin Kwon
Yea, I think we can automate this process via, for instance,
https://github.com/apache/spark/blob/master/dev/github_jira_sync.py

+1 for such sort of automatic categorizing and matching metadata between
JIRA and github

Adding Josh and Sean as well.

On Thu, 13 Jun 2019, 13:17 Dongjoon Hyun,  wrote:

> Hi, All.
>
> Since we use both Apache JIRA and GitHub actively for Apache Spark
> contributions, we have lots of JIRAs and PRs consequently. One specific
> thing I've been longing to see is `Jira Issue Type` in GitHub.
>
> How about exposing JIRA issue types at GitHub PRs as GitHub `Labels`?
> There are two main benefits:
> 1. It helps the communication between the contributors and reviewers with
> more information.
> (In some cases, some people only visit GitHub to see the PR and
> commits)
> 2. `Labels` is searchable. We don't need to visit Apache Jira to search
> PRs to see a specific type.
> (For example, the reviewers can see and review 'BUG' PRs first by
> using `is:open is:pr label:BUG`.)
>
> Of course, this can be done automatically without human intervention.
> Since we already have GitHub Jenkins job to access JIRA/GitHub, that job
> can add the labels from the beginning. If needed, I can volunteer to update
> the script.
>
> To show the demo, I labeled several PRs manually. You can see the result
> right now in Apache Spark PR page.
>
>   - https://github.com/apache/spark/pulls
>
> If you're surprised due to those manual activities, I want to apologize
> for that. I hope we can take advantage of the existing GitHub features to
> serve Apache Spark community in a way better than yesterday.
>
> How do you think about this specific suggestion?
>
> Bests,
> Dongjoon
>
> PS. I saw that `Request Review` and `Assign` features are already used for
> some purposes, but these feature are out of the scope in this email.
>