Re: Is there any Spark source in Java

2018-11-03 Thread Holden Karau
Parts of it are indeed written in Java. You probably want to reach out to
the developers list to talk about changing Spark.

On Sat, Nov 3, 2018, 11:42 AM Soheil Pourbafrani  Hi, I want to customize some part of Spark. I was wondering if there any
> Spark source is written in Java language, or all the sources are in Scala
> language?
>


Re: Is there any Spark source in Java

2018-11-03 Thread Soheil Pourbafrani
Thanks,

I don't need sources for learning Spark.
I need some sources in Java language that I can implement new functions.
For example, Spark has HashTF and I want to customize that in such a way:

public static class newHashingTF implements Something

On Sat, Nov 3, 2018 at 10:39 PM Chris Olivier  wrote:

> that’s a great link. thanks!
>
> On Sat, Nov 3, 2018 at 11:55 AM Jean Georges Perrin  wrote:
>
>> I would take this one very closely to my heart :)
>>
>> Look at:
>> https://github.com/jgperrin/net.jgp.labs.spark
>>
>> And if the examples are too weird, have a look at:
>> http://jgp.net/book published at Manning
>>
>> Feedback appreciated!
>>
>> jg
>>
>>
>> On Nov 3, 2018, at 12:30, Jeyhun Karimov  wrote:
>>
>> Hi Soheil,
>>
>> From the spark github repo, you can find some classes implemented in Java:
>>
>> https://github.com/apache/spark/search?l=java
>>
>> Cheers,
>> Jeyhun
>>
>> On Sat, Nov 3, 2018 at 6:42 PM Soheil Pourbafrani 
>> wrote:
>>
>>> Hi, I want to customize some part of Spark. I was wondering if there any
>>> Spark source is written in Java language, or all the sources are in Scala
>>> language?
>>>
>>


Re: Is there any Spark source in Java

2018-11-03 Thread Chris Olivier
that’s a great link. thanks!

On Sat, Nov 3, 2018 at 11:55 AM Jean Georges Perrin  wrote:

> I would take this one very closely to my heart :)
>
> Look at:
> https://github.com/jgperrin/net.jgp.labs.spark
>
> And if the examples are too weird, have a look at:
> http://jgp.net/book published at Manning
>
> Feedback appreciated!
>
> jg
>
>
> On Nov 3, 2018, at 12:30, Jeyhun Karimov  wrote:
>
> Hi Soheil,
>
> From the spark github repo, you can find some classes implemented in Java:
>
> https://github.com/apache/spark/search?l=java
>
> Cheers,
> Jeyhun
>
> On Sat, Nov 3, 2018 at 6:42 PM Soheil Pourbafrani 
> wrote:
>
>> Hi, I want to customize some part of Spark. I was wondering if there any
>> Spark source is written in Java language, or all the sources are in Scala
>> language?
>>
>


Re: Is there any Spark source in Java

2018-11-03 Thread Jean Georges Perrin
I would take this one very closely to my heart :) 

Look at:
https://github.com/jgperrin/net.jgp.labs.spark

And if the examples are too weird, have a look at:
http://jgp.net/book published at Manning 

Feedback appreciated!

jg


> On Nov 3, 2018, at 12:30, Jeyhun Karimov  wrote:
> 
> Hi Soheil,
> 
> From the spark github repo, you can find some classes implemented in Java:
> 
> https://github.com/apache/spark/search?l=java
> 
> Cheers,
> Jeyhun
> 
>> On Sat, Nov 3, 2018 at 6:42 PM Soheil Pourbafrani  
>> wrote:
>> Hi, I want to customize some part of Spark. I was wondering if there any 
>> Spark source is written in Java language, or all the sources are in Scala 
>> language?


Re: Is there any Spark source in Java

2018-11-03 Thread Jeyhun Karimov
Hi Soheil,

>From the spark github repo, you can find some classes implemented in Java:

https://github.com/apache/spark/search?l=java

Cheers,
Jeyhun

On Sat, Nov 3, 2018 at 6:42 PM Soheil Pourbafrani 
wrote:

> Hi, I want to customize some part of Spark. I was wondering if there any
> Spark source is written in Java language, or all the sources are in Scala
> language?
>


Is there any Spark source in Java

2018-11-03 Thread Soheil Pourbafrani
Hi, I want to customize some part of Spark. I was wondering if there any
Spark source is written in Java language, or all the sources are in Scala
language?


Fwd: How to avoid long-running jobs blocking short-running jobs

2018-11-03 Thread onmstester onmstester
You could have used two separate pools with different weights for ETL and rest 
jobs, when ETL pool weights is about 1 and Rest weight is 1000, anytime a Rest 
Job comes in, it allocate all the resources. Details: 
https://spark.apache.org/docs/latest/job-scheduling.html Sent using Zoho Mail 
 Forwarded message  From : conner 
 To :  Date : Sat, 03 Nov 2018 
12:34:01 +0330 Subject : How to avoid long-running jobs blocking short-running 
jobs  Forwarded message  Hi, I use spark cluster to run 
ETL jobs and analysis computation about the data after elt stage. The elt jobs 
can keep running for several hours, but analysis computation is a short-running 
job which can finish in a few seconds. The dilemma I entrapped is that my 
application runs in a single JVM and can't be a cluster application, so just 
one spark context in my application currently. But when the elt jobs are 
running, the jobs will occupy all resource including worker executors too long 
to block all my analysis computation jobs. My solution is to find a good way to 
divide the spark cluster resource into two. One part for analysis computation 
jobs, another for elt jobs. if the part for elt jobs is free, I can allocate 
analysis computation jobs to it. So I want to find a middleware that can 
support two spark context and it must be embedded in my application. I do some 
research on the third party project spark job server. It can divide spark 
resource by launching another JVM to run spark context with a specific 
resource. these operations are invisible to the upper layer, so it's a good 
solution for me. But this project is running in a single JVM and just support 
REST API, I can't endure the data transfer by TCP again which too slow to me. I 
want to get a result from spark cluster by TCP and give this result to view 
layer to show. Can anybody give me some good suggestion? I shall be so 
grateful. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ 
- To 
unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: How to avoid long-running jobs blocking short-running jobs

2018-11-03 Thread Jörn Franke
Hi,

What does your Spark deployment architecture looks like? Standalone? Yarn? 
Mesos? Kubernetes? Those have resource managers (not middlewares) that allow to 
implement scenarios as you want to achieve.

 In any case you can try the FairScheduler of any of those solutions.

Best regards

> Am 03.11.2018 um 10:04 schrieb conner :
> 
> Hi,
> 
> I use spark cluster to run ETL jobs and analysis computation about the data
> after elt stage.
> The elt jobs can keep running for several hours, but analysis computation is
> a short-running job which can finish in a few seconds.
> The dilemma I entrapped is that my application runs in a single JVM and
> can't be a cluster application, so just one spark context in my application
> currently. But when the elt jobs are running,
> the jobs will occupy all resource including worker executors too long to
> block all my analysis computation jobs. 
> 
> My solution is to find a good way to divide the spark cluster resource into
> two. One part for analysis computation jobs, another for
> elt jobs. if the part for elt jobs is free, I can allocate analysis
> computation jobs to it.
> So I want to find a middleware that can support two spark context and it
> must be embedded in my application. I do some research on the third party
> project spark job server. It can divide spark resource by launching another
> JVM to run spark context with a specific resource.
> these operations are invisible to the upper layer, so it's a good solution
> for me. But this project is running in a single JVM  and just support REST
> API, I can't endure the data transfer by TCP again
> which too slow to me. I want to get a result from spark cluster by TCP and
> give this result to view layer to show.
> Can anybody give me some good suggestion? I shall be so grateful.
> 
> 
> 
> 
> 
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
> 
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> 

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: How to avoid long-running jobs blocking short-running jobs

2018-11-03 Thread Nicolas Paris
On Sat, Nov 03, 2018 at 02:04:01AM -0700, conner wrote:
> My solution is to find a good way to divide the spark cluster resource
> into two. 

What about yarn and its queue management system ?

-- 
nicolas

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



How to avoid long-running jobs blocking short-running jobs

2018-11-03 Thread conner
Hi,

I use spark cluster to run ETL jobs and analysis computation about the data
after elt stage.
The elt jobs can keep running for several hours, but analysis computation is
a short-running job which can finish in a few seconds.
The dilemma I entrapped is that my application runs in a single JVM and
can't be a cluster application, so just one spark context in my application
currently. But when the elt jobs are running,
the jobs will occupy all resource including worker executors too long to
block all my analysis computation jobs. 

My solution is to find a good way to divide the spark cluster resource into
two. One part for analysis computation jobs, another for
elt jobs. if the part for elt jobs is free, I can allocate analysis
computation jobs to it.
So I want to find a middleware that can support two spark context and it
must be embedded in my application. I do some research on the third party
project spark job server. It can divide spark resource by launching another
JVM to run spark context with a specific resource.
these operations are invisible to the upper layer, so it's a good solution
for me. But this project is running in a single JVM  and just support REST
API, I can't endure the data transfer by TCP again
which too slow to me. I want to get a result from spark cluster by TCP and
give this result to view layer to show.
Can anybody give me some good suggestion? I shall be so grateful.





--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org