Can I specify watermark using raw sql alone?

2018-07-14 Thread kant kodali
Hi All,

Can I specify watermark using raw sql alone? other words without using
.withWatermark from
Dataset API.

Thanks!


how to decide broadcast join data size

2018-07-14 Thread Selvam Raman
Hi,

I could not find useful formula or documentation which will help me to
decide the broadcast join data size depends on the cluster size.

Please let me know is there thumb rule available to find.

For example
cluster size - 20 node cluster, 32 gb per node and 8 core per node.

executor-memory = 8gb, executor-core=4

Memory:
8gb(0.4% per internal) - 4.8gb for actual computation and storage. lets
consider i have not done any persist in this case i could utilize 4.8gb per
executor.
IS IT POSSIBLE FOR ME TO USE 400MB file for BROADCAST JOIN?

-- 
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"


Re: Do GraphFrames support streaming?

2018-07-14 Thread Jörn Franke
No, streaming dataframe needs to be written to disk or similar (or an in-memory 
backend) then when the next stream arrive join them - create graph and store 
the next stream together with the existing stream on disk etc.

> On 14. Jul 2018, at 17:19, kant kodali  wrote:
> 
> The question now would be can it be done in streaming fashion? Are you 
> talking about the union of two streaming dataframes and then constructing a 
> graphframe (also during streaming) ?
> 
>> On Sat, Jul 14, 2018 at 8:07 AM, Jörn Franke  wrote:
>> For your use case one might indeed be able to work simply with incremental 
>> graph updates. However they are not straight forward in Spark. You can union 
>> the new Data with the existing dataframes that represent your graph and 
>> create from that a new graph frame.
>> 
>> However I am not sure if this will fully fulfill your requirement for 
>> incremental graph updates.
>> 
>>> On 14. Jul 2018, at 15:59, kant kodali  wrote:
>>> 
>>> "You want to update incrementally an existing graph and run incrementally a 
>>> graph algorithm suitable for this - you have to implement yourself as far 
>>> as I am aware"
>>> 
>>> I want to update the graph incrementally and want to run some graph queries 
>>> similar to Cypher like give me all the vertices that are connected by a 
>>> specific set of edges and so on. Don't really intend to run graph 
>>> algorithms like ConnectedComponents or anything else at this point but of 
>>> course, it's great to have.
>>> 
>>> If we were to do this myself should I extend the GraphFrame? any 
>>> suggestions?
>>> 
>>> 
 On Sun, Apr 29, 2018 at 3:24 AM, Jörn Franke  wrote:
 What is the use case you are trying to solve?
 You want to load graph data from a streaming window in separate graphs - 
 possible but requires probably a lot of memory. 
 You want to update an existing graph with new streaming data and then 
 fully rerun an algorithms -> look at Janusgraph
 You want to update incrementally an existing graph and run incrementally a 
 graph algorithm suitable for this - you have to implement yourself as far 
 as I am aware
 
 > On 29. Apr 2018, at 11:43, kant kodali  wrote:
 > 
 > Do GraphFrames support streaming?
>>> 
> 


Re: Do GraphFrames support streaming?

2018-07-14 Thread kant kodali
The question now would be can it be done in streaming fashion? Are you
talking about the union of two streaming dataframes and then constructing a
graphframe (also during streaming) ?

On Sat, Jul 14, 2018 at 8:07 AM, Jörn Franke  wrote:

> For your use case one might indeed be able to work simply with incremental
> graph updates. However they are not straight forward in Spark. You can
> union the new Data with the existing dataframes that represent your graph
> and create from that a new graph frame.
>
> However I am not sure if this will fully fulfill your requirement for
> incremental graph updates.
>
> On 14. Jul 2018, at 15:59, kant kodali  wrote:
>
> "You want to update incrementally an existing graph and run incrementally
> a graph algorithm suitable for this - you have to implement yourself as
> far as I am aware"
>
> I want to update the graph incrementally and want to run some graph
> queries similar to Cypher like give me all the vertices that are connected
> by a specific set of edges and so on. Don't really intend to run graph
> algorithms like ConnectedComponents or anything else at this point but of
> course, it's great to have.
>
> If we were to do this myself should I extend the GraphFrame? any
> suggestions?
>
>
> On Sun, Apr 29, 2018 at 3:24 AM, Jörn Franke  wrote:
>
>> What is the use case you are trying to solve?
>> You want to load graph data from a streaming window in separate graphs -
>> possible but requires probably a lot of memory.
>> You want to update an existing graph with new streaming data and then
>> fully rerun an algorithms -> look at Janusgraph
>> You want to update incrementally an existing graph and run incrementally
>> a graph algorithm suitable for this - you have to implement yourself as far
>> as I am aware
>>
>> > On 29. Apr 2018, at 11:43, kant kodali  wrote:
>> >
>> > Do GraphFrames support streaming?
>>
>
>


Re: Do GraphFrames support streaming?

2018-07-14 Thread Jörn Franke
For your use case one might indeed be able to work simply with incremental 
graph updates. However they are not straight forward in Spark. You can union 
the new Data with the existing dataframes that represent your graph and create 
from that a new graph frame.

However I am not sure if this will fully fulfill your requirement for 
incremental graph updates.

> On 14. Jul 2018, at 15:59, kant kodali  wrote:
> 
> "You want to update incrementally an existing graph and run incrementally a 
> graph algorithm suitable for this - you have to implement yourself as far as 
> I am aware"
> 
> I want to update the graph incrementally and want to run some graph queries 
> similar to Cypher like give me all the vertices that are connected by a 
> specific set of edges and so on. Don't really intend to run graph algorithms 
> like ConnectedComponents or anything else at this point but of course, it's 
> great to have.
> 
> If we were to do this myself should I extend the GraphFrame? any suggestions?
> 
> 
>> On Sun, Apr 29, 2018 at 3:24 AM, Jörn Franke  wrote:
>> What is the use case you are trying to solve?
>> You want to load graph data from a streaming window in separate graphs - 
>> possible but requires probably a lot of memory. 
>> You want to update an existing graph with new streaming data and then fully 
>> rerun an algorithms -> look at Janusgraph
>> You want to update incrementally an existing graph and run incrementally a 
>> graph algorithm suitable for this - you have to implement yourself as far as 
>> I am aware
>> 
>> > On 29. Apr 2018, at 11:43, kant kodali  wrote:
>> > 
>> > Do GraphFrames support streaming?
> 


Re: Do GraphFrames support streaming?

2018-07-14 Thread kant kodali
"You want to update incrementally an existing graph and run incrementally a
graph algorithm suitable for this - you have to implement yourself as far
as I am aware"

I want to update the graph incrementally and want to run some graph queries
similar to Cypher like give me all the vertices that are connected by a
specific set of edges and so on. Don't really intend to run graph
algorithms like ConnectedComponents or anything else at this point but of
course, it's great to have.

If we were to do this myself should I extend the GraphFrame? any
suggestions?


On Sun, Apr 29, 2018 at 3:24 AM, Jörn Franke  wrote:

> What is the use case you are trying to solve?
> You want to load graph data from a streaming window in separate graphs -
> possible but requires probably a lot of memory.
> You want to update an existing graph with new streaming data and then
> fully rerun an algorithms -> look at Janusgraph
> You want to update incrementally an existing graph and run incrementally a
> graph algorithm suitable for this - you have to implement yourself as far
> as I am aware
>
> > On 29. Apr 2018, at 11:43, kant kodali  wrote:
> >
> > Do GraphFrames support streaming?
>


Spark Shortcut

2018-07-14 Thread Deepu Raj
Hi Team,

Using Spark 2.3 :paste -raw not working.

Do ctrl+D after pasting the code get message //Exiting paste mode, now 
interpreting. Nothing happens.
Please help.

Thanks,
Deepu