Re: Can we access files on Cluster mode

2017-06-25 Thread sudhir k
Thank you . I guess I have to use common mount or s3 to access those files.

On Sun, Jun 25, 2017 at 4:42 AM Mich Talebzadeh 
wrote:

> Thanks. In my experience certain distros like Cloudera only support yarn
> client mode so AFAIK the driver stays on the Edge node. Happy to be
> corrected :)
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 25 June 2017 at 10:37, Anastasios Zouzias  wrote:
>
>> Hi Mich,
>>
>> If the driver starts on the edge node with cluster mode, then I don't see
>> the difference between client and cluster deploy mode.
>>
>> In cluster mode, it is the responsibility of the resource manager (yarn,
>> etc) to decide where to run the driver (at least for spark 1.6 this is what
>> I have experienced).
>>
>> Best,
>> Anastasios
>>
>> On Sun, Jun 25, 2017 at 11:14 AM, Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> Hi Anastasios.
>>>
>>> Are you implying that in Yarn cluster mode even if you submit your Spark
>>> application on an Edge node the driver can start on any node. I was under
>>> the impression that the driver starts from the Edge node? and the executors
>>> can be on any node in the cluster (where Spark agents are running)?
>>>
>>> Thanks
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> *
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>> On 25 June 2017 at 09:39, Anastasios Zouzias  wrote:
>>>
 Just to note that in cluster mode the spark driver might run on any
 node of the cluster, hence you need to make sure that the file exists on
 *all* nodes. Push the file on all nodes or use client deploy-mode.

 Best,
 Anastasios


 Am 24.06.2017 23:24 schrieb "Holden Karau" :

> addFile is supposed to not depend on a shared FS unless the semantics
> have changed recently.
>
> On Sat, Jun 24, 2017 at 11:55 AM varma dantuluri 
> wrote:
>
>> Hi Sudhir,
>>
>> I believe you have to use a shared file system that is accused by all
>> nodes.
>>
>>
>> On Jun 24, 2017, at 1:30 PM, sudhir k  wrote:
>>
>>
>> I am new to Spark and i need some guidance on how to fetch files from
>> --files option on Spark-Submit.
>>
>> I read on some forums that we can fetch the files from
>> Spark.getFiles(fileName) and can use it in our code and all nodes should
>> read it.
>>
>> But i am facing some issue
>>
>> Below is the command i am using
>>
>> spark-submit --deploy-mode cluster --class com.check.Driver --files
>> /home/sql/first.sql test.jar 20170619
>>
>> so when i use SparkFiles.get(first.sql) , i should be able to read
>> the file Path but it is throwing File not Found exception.
>>
>> I tried SpackContext.addFile(/home/sql/first.sql) and then
>> SparkFiles.get(first.sql) but still the same error.
>>
>> Its working on the stand alone mode but not on cluster mode. Any help
>> is appreciated.. Using Spark 2.1.0 and Scala 2.11
>>
>> Thanks.
>>
>>
>> Regards,
>> Sudhir K
>>
>>
>>
>> --
>> Regards,
>> Sudhir K
>>
>>
>> --
> Cell : 425-233-8271 <(425)%20233-8271>
> Twitter: https://twitter.com/holdenkarau
>

>>>
>>
>>
>> --
>> -- Anastasios Zouzias
>> 
>>
>
> --
Sent from Gmail Mobile


Re: Can we access files on Cluster mode

2017-06-25 Thread Mich Talebzadeh
Thanks. In my experience certain distros like Cloudera only support yarn
client mode so AFAIK the driver stays on the Edge node. Happy to be
corrected :)



Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 25 June 2017 at 10:37, Anastasios Zouzias  wrote:

> Hi Mich,
>
> If the driver starts on the edge node with cluster mode, then I don't see
> the difference between client and cluster deploy mode.
>
> In cluster mode, it is the responsibility of the resource manager (yarn,
> etc) to decide where to run the driver (at least for spark 1.6 this is what
> I have experienced).
>
> Best,
> Anastasios
>
> On Sun, Jun 25, 2017 at 11:14 AM, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> Hi Anastasios.
>>
>> Are you implying that in Yarn cluster mode even if you submit your Spark
>> application on an Edge node the driver can start on any node. I was under
>> the impression that the driver starts from the Edge node? and the executors
>> can be on any node in the cluster (where Spark agents are running)?
>>
>> Thanks
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 25 June 2017 at 09:39, Anastasios Zouzias  wrote:
>>
>>> Just to note that in cluster mode the spark driver might run on any node
>>> of the cluster, hence you need to make sure that the file exists on *all*
>>> nodes. Push the file on all nodes or use client deploy-mode.
>>>
>>> Best,
>>> Anastasios
>>>
>>>
>>> Am 24.06.2017 23:24 schrieb "Holden Karau" :
>>>
 addFile is supposed to not depend on a shared FS unless the semantics
 have changed recently.

 On Sat, Jun 24, 2017 at 11:55 AM varma dantuluri 
 wrote:

> Hi Sudhir,
>
> I believe you have to use a shared file system that is accused by all
> nodes.
>
>
> On Jun 24, 2017, at 1:30 PM, sudhir k  wrote:
>
>
> I am new to Spark and i need some guidance on how to fetch files from
> --files option on Spark-Submit.
>
> I read on some forums that we can fetch the files from
> Spark.getFiles(fileName) and can use it in our code and all nodes should
> read it.
>
> But i am facing some issue
>
> Below is the command i am using
>
> spark-submit --deploy-mode cluster --class com.check.Driver --files
> /home/sql/first.sql test.jar 20170619
>
> so when i use SparkFiles.get(first.sql) , i should be able to read the
> file Path but it is throwing File not Found exception.
>
> I tried SpackContext.addFile(/home/sql/first.sql) and then
> SparkFiles.get(first.sql) but still the same error.
>
> Its working on the stand alone mode but not on cluster mode. Any help
> is appreciated.. Using Spark 2.1.0 and Scala 2.11
>
> Thanks.
>
>
> Regards,
> Sudhir K
>
>
>
> --
> Regards,
> Sudhir K
>
>
> --
 Cell : 425-233-8271 <(425)%20233-8271>
 Twitter: https://twitter.com/holdenkarau

>>>
>>
>
>
> --
> -- Anastasios Zouzias
> 
>


Re: Can we access files on Cluster mode

2017-06-25 Thread Anastasios Zouzias
Hi Mich,

If the driver starts on the edge node with cluster mode, then I don't see
the difference between client and cluster deploy mode.

In cluster mode, it is the responsibility of the resource manager (yarn,
etc) to decide where to run the driver (at least for spark 1.6 this is what
I have experienced).

Best,
Anastasios

On Sun, Jun 25, 2017 at 11:14 AM, Mich Talebzadeh  wrote:

> Hi Anastasios.
>
> Are you implying that in Yarn cluster mode even if you submit your Spark
> application on an Edge node the driver can start on any node. I was under
> the impression that the driver starts from the Edge node? and the executors
> can be on any node in the cluster (where Spark agents are running)?
>
> Thanks
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 25 June 2017 at 09:39, Anastasios Zouzias  wrote:
>
>> Just to note that in cluster mode the spark driver might run on any node
>> of the cluster, hence you need to make sure that the file exists on *all*
>> nodes. Push the file on all nodes or use client deploy-mode.
>>
>> Best,
>> Anastasios
>>
>>
>> Am 24.06.2017 23:24 schrieb "Holden Karau" :
>>
>>> addFile is supposed to not depend on a shared FS unless the semantics
>>> have changed recently.
>>>
>>> On Sat, Jun 24, 2017 at 11:55 AM varma dantuluri 
>>> wrote:
>>>
 Hi Sudhir,

 I believe you have to use a shared file system that is accused by all
 nodes.


 On Jun 24, 2017, at 1:30 PM, sudhir k  wrote:


 I am new to Spark and i need some guidance on how to fetch files from
 --files option on Spark-Submit.

 I read on some forums that we can fetch the files from
 Spark.getFiles(fileName) and can use it in our code and all nodes should
 read it.

 But i am facing some issue

 Below is the command i am using

 spark-submit --deploy-mode cluster --class com.check.Driver --files
 /home/sql/first.sql test.jar 20170619

 so when i use SparkFiles.get(first.sql) , i should be able to read the
 file Path but it is throwing File not Found exception.

 I tried SpackContext.addFile(/home/sql/first.sql) and then
 SparkFiles.get(first.sql) but still the same error.

 Its working on the stand alone mode but not on cluster mode. Any help
 is appreciated.. Using Spark 2.1.0 and Scala 2.11

 Thanks.


 Regards,
 Sudhir K



 --
 Regards,
 Sudhir K


 --
>>> Cell : 425-233-8271 <(425)%20233-8271>
>>> Twitter: https://twitter.com/holdenkarau
>>>
>>
>


-- 
-- Anastasios Zouzias



Re: Can we access files on Cluster mode

2017-06-25 Thread Mich Talebzadeh
Hi Anastasios.

Are you implying that in Yarn cluster mode even if you submit your Spark
application on an Edge node the driver can start on any node. I was under
the impression that the driver starts from the Edge node? and the executors
can be on any node in the cluster (where Spark agents are running)?

Thanks


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 25 June 2017 at 09:39, Anastasios Zouzias  wrote:

> Just to note that in cluster mode the spark driver might run on any node
> of the cluster, hence you need to make sure that the file exists on *all*
> nodes. Push the file on all nodes or use client deploy-mode.
>
> Best,
> Anastasios
>
>
> Am 24.06.2017 23:24 schrieb "Holden Karau" :
>
>> addFile is supposed to not depend on a shared FS unless the semantics
>> have changed recently.
>>
>> On Sat, Jun 24, 2017 at 11:55 AM varma dantuluri 
>> wrote:
>>
>>> Hi Sudhir,
>>>
>>> I believe you have to use a shared file system that is accused by all
>>> nodes.
>>>
>>>
>>> On Jun 24, 2017, at 1:30 PM, sudhir k  wrote:
>>>
>>>
>>> I am new to Spark and i need some guidance on how to fetch files from
>>> --files option on Spark-Submit.
>>>
>>> I read on some forums that we can fetch the files from
>>> Spark.getFiles(fileName) and can use it in our code and all nodes should
>>> read it.
>>>
>>> But i am facing some issue
>>>
>>> Below is the command i am using
>>>
>>> spark-submit --deploy-mode cluster --class com.check.Driver --files
>>> /home/sql/first.sql test.jar 20170619
>>>
>>> so when i use SparkFiles.get(first.sql) , i should be able to read the
>>> file Path but it is throwing File not Found exception.
>>>
>>> I tried SpackContext.addFile(/home/sql/first.sql) and then
>>> SparkFiles.get(first.sql) but still the same error.
>>>
>>> Its working on the stand alone mode but not on cluster mode. Any help is
>>> appreciated.. Using Spark 2.1.0 and Scala 2.11
>>>
>>> Thanks.
>>>
>>>
>>> Regards,
>>> Sudhir K
>>>
>>>
>>>
>>> --
>>> Regards,
>>> Sudhir K
>>>
>>>
>>> --
>> Cell : 425-233-8271 <(425)%20233-8271>
>> Twitter: https://twitter.com/holdenkarau
>>
>


Re: Can we access files on Cluster mode

2017-06-25 Thread Anastasios Zouzias
Just to note that in cluster mode the spark driver might run on any node of
the cluster, hence you need to make sure that the file exists on *all*
nodes. Push the file on all nodes or use client deploy-mode.

Best,
Anastasios

Am 24.06.2017 23:24 schrieb "Holden Karau" :

> addFile is supposed to not depend on a shared FS unless the semantics have
> changed recently.
>
> On Sat, Jun 24, 2017 at 11:55 AM varma dantuluri 
> wrote:
>
>> Hi Sudhir,
>>
>> I believe you have to use a shared file system that is accused by all
>> nodes.
>>
>>
>> On Jun 24, 2017, at 1:30 PM, sudhir k  wrote:
>>
>>
>> I am new to Spark and i need some guidance on how to fetch files from
>> --files option on Spark-Submit.
>>
>> I read on some forums that we can fetch the files from
>> Spark.getFiles(fileName) and can use it in our code and all nodes should
>> read it.
>>
>> But i am facing some issue
>>
>> Below is the command i am using
>>
>> spark-submit --deploy-mode cluster --class com.check.Driver --files
>> /home/sql/first.sql test.jar 20170619
>>
>> so when i use SparkFiles.get(first.sql) , i should be able to read the
>> file Path but it is throwing File not Found exception.
>>
>> I tried SpackContext.addFile(/home/sql/first.sql) and then
>> SparkFiles.get(first.sql) but still the same error.
>>
>> Its working on the stand alone mode but not on cluster mode. Any help is
>> appreciated.. Using Spark 2.1.0 and Scala 2.11
>>
>> Thanks.
>>
>>
>> Regards,
>> Sudhir K
>>
>>
>>
>> --
>> Regards,
>> Sudhir K
>>
>>
>> --
> Cell : 425-233-8271
> Twitter: https://twitter.com/holdenkarau
>


Re: Can we access files on Cluster mode

2017-06-24 Thread Holden Karau
addFile is supposed to not depend on a shared FS unless the semantics have
changed recently.

On Sat, Jun 24, 2017 at 11:55 AM varma dantuluri 
wrote:

> Hi Sudhir,
>
> I believe you have to use a shared file system that is accused by all
> nodes.
>
>
> On Jun 24, 2017, at 1:30 PM, sudhir k  wrote:
>
>
> I am new to Spark and i need some guidance on how to fetch files from
> --files option on Spark-Submit.
>
> I read on some forums that we can fetch the files from
> Spark.getFiles(fileName) and can use it in our code and all nodes should
> read it.
>
> But i am facing some issue
>
> Below is the command i am using
>
> spark-submit --deploy-mode cluster --class com.check.Driver --files
> /home/sql/first.sql test.jar 20170619
>
> so when i use SparkFiles.get(first.sql) , i should be able to read the
> file Path but it is throwing File not Found exception.
>
> I tried SpackContext.addFile(/home/sql/first.sql) and then
> SparkFiles.get(first.sql) but still the same error.
>
> Its working on the stand alone mode but not on cluster mode. Any help is
> appreciated.. Using Spark 2.1.0 and Scala 2.11
>
> Thanks.
>
>
> Regards,
> Sudhir K
>
>
>
> --
> Regards,
> Sudhir K
>
>
> --
Cell : 425-233-8271
Twitter: https://twitter.com/holdenkarau


Re: Can we access files on Cluster mode

2017-06-24 Thread varma dantuluri
Hi Sudhir,

I believe you have to use a shared file system that is accused by all nodes.


> On Jun 24, 2017, at 1:30 PM, sudhir k  wrote:
> 
> 
> I am new to Spark and i need some guidance on how to fetch files from --files 
> option on Spark-Submit.
> 
> I read on some forums that we can fetch the files from 
> Spark.getFiles(fileName) and can use it in our code and all nodes should read 
> it.
> 
> But i am facing some issue
> 
> Below is the command i am using
> 
> spark-submit --deploy-mode cluster --class com.check.Driver --files 
> /home/sql/first.sql test.jar 20170619
> 
> so when i use SparkFiles.get(first.sql) , i should be able to read the file 
> Path but it is throwing File not Found exception.
> 
> I tried SpackContext.addFile(/home/sql/first.sql) and then 
> SparkFiles.get(first.sql) but still the same error.
> 
> Its working on the stand alone mode but not on cluster mode. Any help is 
> appreciated.. Using Spark 2.1.0 and Scala 2.11
> 
> Thanks.
> 
> 
> 
> Regards,
> Sudhir K
> 
> 
> 
> -- 
> Regards,
> Sudhir K