unsubscribe

2017-08-12 Thread sudhir k
-- 
Regards,
Sudhir K


Withcolumn date with sysdate

2017-06-30 Thread sudhir k
Can we add a column to dataframe with a default value like sysdate .. I am
calling my udf but it is throwing error col expected .

On spark shell
df.withcolumn("date",curent_date) works I need similiar for scala program
which I can build in a jar


Thanks,
Sudhir
-- 
Sent from Gmail Mobile


Re: Can we access files on Cluster mode

2017-06-25 Thread sudhir k
Thank you . I guess I have to use common mount or s3 to access those files.

On Sun, Jun 25, 2017 at 4:42 AM Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> Thanks. In my experience certain distros like Cloudera only support yarn
> client mode so AFAIK the driver stays on the Edge node. Happy to be
> corrected :)
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 25 June 2017 at 10:37, Anastasios Zouzias <zouz...@gmail.com> wrote:
>
>> Hi Mich,
>>
>> If the driver starts on the edge node with cluster mode, then I don't see
>> the difference between client and cluster deploy mode.
>>
>> In cluster mode, it is the responsibility of the resource manager (yarn,
>> etc) to decide where to run the driver (at least for spark 1.6 this is what
>> I have experienced).
>>
>> Best,
>> Anastasios
>>
>> On Sun, Jun 25, 2017 at 11:14 AM, Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> Hi Anastasios.
>>>
>>> Are you implying that in Yarn cluster mode even if you submit your Spark
>>> application on an Edge node the driver can start on any node. I was under
>>> the impression that the driver starts from the Edge node? and the executors
>>> can be on any node in the cluster (where Spark agents are running)?
>>>
>>> Thanks
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>> On 25 June 2017 at 09:39, Anastasios Zouzias <zouz...@gmail.com> wrote:
>>>
>>>> Just to note that in cluster mode the spark driver might run on any
>>>> node of the cluster, hence you need to make sure that the file exists on
>>>> *all* nodes. Push the file on all nodes or use client deploy-mode.
>>>>
>>>> Best,
>>>> Anastasios
>>>>
>>>>
>>>> Am 24.06.2017 23:24 schrieb "Holden Karau" <hol...@pigscanfly.ca>:
>>>>
>>>>> addFile is supposed to not depend on a shared FS unless the semantics
>>>>> have changed recently.
>>>>>
>>>>> On Sat, Jun 24, 2017 at 11:55 AM varma dantuluri <dvsnva...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Sudhir,
>>>>>>
>>>>>> I believe you have to use a shared file system that is accused by all
>>>>>> nodes.
>>>>>>
>>>>>>
>>>>>> On Jun 24, 2017, at 1:30 PM, sudhir k <k.sudhi...@gmail.com> wrote:
>>>>>>
>>>>>>
>>>>>> I am new to Spark and i need some guidance on how to fetch files from
>>>>>> --files option on Spark-Submit.
>>>>>>
>>>>>> I read on some forums that we can fetch the files from
>>>>>> Spark.getFiles(fileName) and can use it in our code and all nodes should
>>>>>> read it.
>>>>>>
>>>>>> But i am facing some issue
>>>>>>
>>>>>> Below is the command i am using
>>>>>>
>>>>>> spark-submit --deploy-mode cluster --class com.check.Driver --files
>>>>>> /home/sql/first.sql test.jar 20170619
>>>>>>
>>>>>> so when i use SparkFiles.get(first.sql) , i should be able to read
>>>>>> the file Path but it is throwing File not Found exception.
>>>>>>
>>>>>> I tried SpackContext.addFile(/home/sql/first.sql) and then
>>>>>> SparkFiles.get(first.sql) but still the same error.
>>>>>>
>>>>>> Its working on the stand alone mode but not on cluster mode. Any help
>>>>>> is appreciated.. Using Spark 2.1.0 and Scala 2.11
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>> Sudhir K
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Regards,
>>>>>> Sudhir K
>>>>>>
>>>>>>
>>>>>> --
>>>>> Cell : 425-233-8271 <(425)%20233-8271>
>>>>> Twitter: https://twitter.com/holdenkarau
>>>>>
>>>>
>>>
>>
>>
>> --
>> -- Anastasios Zouzias
>> <a...@zurich.ibm.com>
>>
>
> --
Sent from Gmail Mobile


Fwd: Can we access files on Cluster mode

2017-06-24 Thread sudhir k
I am new to Spark and i need some guidance on how to fetch files from
--files option on Spark-Submit.

I read on some forums that we can fetch the files from
Spark.getFiles(fileName) and can use it in our code and all nodes should
read it.

But i am facing some issue

Below is the command i am using

spark-submit --deploy-mode cluster --class com.check.Driver --files
/home/sql/first.sql test.jar 20170619

so when i use SparkFiles.get(first.sql) , i should be able to read the file
Path but it is throwing File not Found exception.

I tried SpackContext.addFile(/home/sql/first.sql) and then
SparkFiles.get(first.sql) but still the same error.

Its working on the stand alone mode but not on cluster mode. Any help is
appreciated.. Using Spark 2.1.0 and Scala 2.11

Thanks.


Regards,
Sudhir K



-- 
Regards,
Sudhir K