Re: [Pyspark 2.3+] Timeseries with Spark
Time series can mean a lot of different things and algorithms. Can you describe more what you mean by time series use case, ie what is the input, what do you like to do with the input and what is the output? > Am 14.06.2019 um 06:01 schrieb Rishi Shah : > > Hi All, > > I have a time series use case which I would like to implement in Spark... > What would be the best way to do so? Any built in libraries? > > -- > Regards, > > Rishi Shah - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
[Pyspark 2.3+] Timeseries with Spark
Hi All, I have a time series use case which I would like to implement in Spark... What would be the best way to do so? Any built in libraries? -- Regards, Rishi Shah
Spark on Yarn - Dynamically getting a list of archives from --archives in spark-submit
Hi Is there any way to get a list of the archives submitted with a spark job from the spark context? I see that spark context has a `.files()` function which returns the files included with `--files`, but I don't see an equivalent for `--archives`. Thanks, Tommy
Re: [Spark Core]: What is the release date for Spark 3 ?
next spark summit On Thu, Jun 13, 2019 at 3:58 AM Alex Dettinger wrote: > Follow up on the release date for Spark 3. Any guesstimate or rough > estimation without commitment would be helpful :) > > Cheers, > Alex > > On Mon, Jun 10, 2019 at 5:24 PM Alex Dettinger > wrote: > >> Hi guys, >> >> I was not able to find the foreseen release date for Spark 3. >> Would one have any information on this please ? >> >> Many thanks, >> Alex >> > -- Sent from my iPhone
Re: Exposing JIRA issue types at GitHub PRs
Thank you for the feedbacks and requirements, Hyukjin, Reynold, Marco. Sure, we can do whatever we want. I'll wait for more feedbacks and proceed to the next steps. Bests, Dongjoon. On Wed, Jun 12, 2019 at 11:51 PM Marco Gaido wrote: > Hi Dongjoon, > Thanks for the proposal! I like the idea. Maybe we can extend it to > component too and to some jira labels such as correctness which may be > worth to highlight in PRs too. My only concern is that in many cases JIRAs > are created not very carefully so they may be incorrect at the moment of > the pr creation and it may be updated later: so keeping them in sync may be > an extra effort.. > > On Thu, 13 Jun 2019, 08:09 Reynold Xin, wrote: > >> Seems like a good idea. Can we test this with a component first? >> >> On Thu, Jun 13, 2019 at 6:17 AM Dongjoon Hyun >> wrote: >> >>> Hi, All. >>> >>> Since we use both Apache JIRA and GitHub actively for Apache Spark >>> contributions, we have lots of JIRAs and PRs consequently. One specific >>> thing I've been longing to see is `Jira Issue Type` in GitHub. >>> >>> How about exposing JIRA issue types at GitHub PRs as GitHub `Labels`? >>> There are two main benefits: >>> 1. It helps the communication between the contributors and reviewers >>> with more information. >>> (In some cases, some people only visit GitHub to see the PR and >>> commits) >>> 2. `Labels` is searchable. We don't need to visit Apache Jira to search >>> PRs to see a specific type. >>> (For example, the reviewers can see and review 'BUG' PRs first by >>> using `is:open is:pr label:BUG`.) >>> >>> Of course, this can be done automatically without human intervention. >>> Since we already have GitHub Jenkins job to access JIRA/GitHub, that job >>> can add the labels from the beginning. If needed, I can volunteer to update >>> the script. >>> >>> To show the demo, I labeled several PRs manually. You can see the result >>> right now in Apache Spark PR page. >>> >>> - https://github.com/apache/spark/pulls >>> >>> If you're surprised due to those manual activities, I want to apologize >>> for that. I hope we can take advantage of the existing GitHub features to >>> serve Apache Spark community in a way better than yesterday. >>> >>> How do you think about this specific suggestion? >>> >>> Bests, >>> Dongjoon >>> >>> PS. I saw that `Request Review` and `Assign` features are already used >>> for some purposes, but these feature are out of the scope in this email. >>> >>
Re: best docker image to use
Thanks Riccardo. This is useful, and it seems it's maintained by jupyter team. I was hoping I would find some maintained by spark team. Right now, I am using the base images from this repo: https://github.com/big-data-europe/docker-spark/ -Marcelo On Tue, 11 Jun 2019 at 12:19, Riccardo Ferrari wrote: > Hi Marcelo, > > I'm used to work with https://github.com/jupyter/docker-stacks. There's > the Scala+jupyter option too. Though there might be better option with > Zeppelin too. > Hth > > > On Tue, 11 Jun 2019, 11:52 Marcelo Valle, wrote: > >> Hi, >> >> I would like to run spark shell + scala on a docker environment, just to >> play with docker in development machine without having to install JVM + a >> lot of things. >> >> Is there something as an "official docker image" I am recommended to use? >> I saw some on docker hub, but it seems they are all contributions from >> pro-active individuals. I wonder whether the group maintaining Apache Spark >> also maintains some docker images for use cases like this? >> >> Thanks, >> Marcelo. >> >> This email is confidential [and may be protected by legal privilege]. If >> you are not the intended recipient, please do not copy or disclose its >> content but contact the sender immediately upon receipt. >> >> KTech Services Ltd is registered in England as company number 10704940. >> >> Registered Office: The River Building, 1 Cousin Lane, London EC4R 3TE, >> United Kingdom >> > This email is confidential [and may be protected by legal privilege]. If you are not the intended recipient, please do not copy or disclose its content but contact the sender immediately upon receipt. KTech Services Ltd is registered in England as company number 10704940. Registered Office: The River Building, 1 Cousin Lane, London EC4R 3TE, United Kingdom
Re: [Spark Core]: What is the release date for Spark 3 ?
Follow up on the release date for Spark 3. Any guesstimate or rough estimation without commitment would be helpful :) Cheers, Alex On Mon, Jun 10, 2019 at 5:24 PM Alex Dettinger wrote: > Hi guys, > > I was not able to find the foreseen release date for Spark 3. > Would one have any information on this please ? > > Many thanks, > Alex >
Re: Exposing JIRA issue types at GitHub PRs
Hi Dongjoon, Thanks for the proposal! I like the idea. Maybe we can extend it to component too and to some jira labels such as correctness which may be worth to highlight in PRs too. My only concern is that in many cases JIRAs are created not very carefully so they may be incorrect at the moment of the pr creation and it may be updated later: so keeping them in sync may be an extra effort.. On Thu, 13 Jun 2019, 08:09 Reynold Xin, wrote: > Seems like a good idea. Can we test this with a component first? > > On Thu, Jun 13, 2019 at 6:17 AM Dongjoon Hyun > wrote: > >> Hi, All. >> >> Since we use both Apache JIRA and GitHub actively for Apache Spark >> contributions, we have lots of JIRAs and PRs consequently. One specific >> thing I've been longing to see is `Jira Issue Type` in GitHub. >> >> How about exposing JIRA issue types at GitHub PRs as GitHub `Labels`? >> There are two main benefits: >> 1. It helps the communication between the contributors and reviewers with >> more information. >> (In some cases, some people only visit GitHub to see the PR and >> commits) >> 2. `Labels` is searchable. We don't need to visit Apache Jira to search >> PRs to see a specific type. >> (For example, the reviewers can see and review 'BUG' PRs first by >> using `is:open is:pr label:BUG`.) >> >> Of course, this can be done automatically without human intervention. >> Since we already have GitHub Jenkins job to access JIRA/GitHub, that job >> can add the labels from the beginning. If needed, I can volunteer to update >> the script. >> >> To show the demo, I labeled several PRs manually. You can see the result >> right now in Apache Spark PR page. >> >> - https://github.com/apache/spark/pulls >> >> If you're surprised due to those manual activities, I want to apologize >> for that. I hope we can take advantage of the existing GitHub features to >> serve Apache Spark community in a way better than yesterday. >> >> How do you think about this specific suggestion? >> >> Bests, >> Dongjoon >> >> PS. I saw that `Request Review` and `Assign` features are already used >> for some purposes, but these feature are out of the scope in this email. >> >
Re: Exposing JIRA issue types at GitHub PRs
Seems like a good idea. Can we test this with a component first? On Thu, Jun 13, 2019 at 6:17 AM Dongjoon Hyun wrote: > Hi, All. > > Since we use both Apache JIRA and GitHub actively for Apache Spark > contributions, we have lots of JIRAs and PRs consequently. One specific > thing I've been longing to see is `Jira Issue Type` in GitHub. > > How about exposing JIRA issue types at GitHub PRs as GitHub `Labels`? > There are two main benefits: > 1. It helps the communication between the contributors and reviewers with > more information. > (In some cases, some people only visit GitHub to see the PR and > commits) > 2. `Labels` is searchable. We don't need to visit Apache Jira to search > PRs to see a specific type. > (For example, the reviewers can see and review 'BUG' PRs first by > using `is:open is:pr label:BUG`.) > > Of course, this can be done automatically without human intervention. > Since we already have GitHub Jenkins job to access JIRA/GitHub, that job > can add the labels from the beginning. If needed, I can volunteer to update > the script. > > To show the demo, I labeled several PRs manually. You can see the result > right now in Apache Spark PR page. > > - https://github.com/apache/spark/pulls > > If you're surprised due to those manual activities, I want to apologize > for that. I hope we can take advantage of the existing GitHub features to > serve Apache Spark community in a way better than yesterday. > > How do you think about this specific suggestion? > > Bests, > Dongjoon > > PS. I saw that `Request Review` and `Assign` features are already used for > some purposes, but these feature are out of the scope in this email. >
Re: Exposing JIRA issue types at GitHub PRs
Yea, I think we can automate this process via, for instance, https://github.com/apache/spark/blob/master/dev/github_jira_sync.py +1 for such sort of automatic categorizing and matching metadata between JIRA and github Adding Josh and Sean as well. On Thu, 13 Jun 2019, 13:17 Dongjoon Hyun, wrote: > Hi, All. > > Since we use both Apache JIRA and GitHub actively for Apache Spark > contributions, we have lots of JIRAs and PRs consequently. One specific > thing I've been longing to see is `Jira Issue Type` in GitHub. > > How about exposing JIRA issue types at GitHub PRs as GitHub `Labels`? > There are two main benefits: > 1. It helps the communication between the contributors and reviewers with > more information. > (In some cases, some people only visit GitHub to see the PR and > commits) > 2. `Labels` is searchable. We don't need to visit Apache Jira to search > PRs to see a specific type. > (For example, the reviewers can see and review 'BUG' PRs first by > using `is:open is:pr label:BUG`.) > > Of course, this can be done automatically without human intervention. > Since we already have GitHub Jenkins job to access JIRA/GitHub, that job > can add the labels from the beginning. If needed, I can volunteer to update > the script. > > To show the demo, I labeled several PRs manually. You can see the result > right now in Apache Spark PR page. > > - https://github.com/apache/spark/pulls > > If you're surprised due to those manual activities, I want to apologize > for that. I hope we can take advantage of the existing GitHub features to > serve Apache Spark community in a way better than yesterday. > > How do you think about this specific suggestion? > > Bests, > Dongjoon > > PS. I saw that `Request Review` and `Assign` features are already used for > some purposes, but these feature are out of the scope in this email. >