URL what is ? SecureByDesign & Use of LOGIN form not pop box.

2020-05-05 Thread Secure Bydesign
@Sean Owen

As you do not know being the result of an average American Education.

A URL has three main parts.

HEAD.  contains Addresses origin  & Destination
BODY.   contains   Index.html
TAIL   contains checksum  total count of bytes in BODY. Just in case 
got lost en route.

TOR.  extracts origin IP and places it's own IP in URL HEAD.
 On the way back TOR over writes  the IP so the URL can find its back home.

Facebook have chosen to move it's servers  inside TOR network cluster.
Thereby exposing their own  Server IPs for faster connectivity.


This way the consumer has protection and the consumer is protected through
SecureByDesign.

UNLESS you are dumb enough to use TOR then login into a website via a LOGIN BOX.

So when you say you have ways of violating consumer Protection.

Do not confuse me with one who is education in US or India.

That is lesson for today on Internet Security.

You shouldn't advertise  your lack of .


Why did you vote for Trump when he said he is pussy grabber.
What happens in USA when like in crocodile Dundee what you grab is not a pussy.

If I come to USA is this the new Greeting Protocol.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Unsubscribe

2020-05-05 Thread Bibudh Lahiri
Unsubscribe


Unsubscribe

2020-05-05 Thread Zeming Yu
Unsubscribe

Get Outlook for Android



PyArrow Exception in Pandas UDF GROUPEDAGG()

2020-05-05 Thread Gautham Acharya
Hi everyone,

I'm running a job that runs a Pandas UDF to GROUP BY a large matrix.

The GROUP BY function runs on a wide dataset. The first column of the dataset 
contains string labels that are GROUPed on. The remaining columns are numeric 
values that are aggregated in the Pandas UDF. The dataset is very wide, with 
50,000 columns and 3 million rows.

--
| label_col | num_col_0 | num_col_1 | num_col_2 |  --- | num_col_5|
|   label_a  | 2.0| 5.6   |  7.123  |
|   label_b  | 11.0  | 1.4   |  2.345  |
|   label_a  | 3.1| 6.2   |  5.444  |



My job runs fine on smaller datasets, with the same number of columns but fewer 
rows. However, when run on a dataset with 3 million rows, I see the following 
exception:

20/05/05 23:36:27 ERROR Executor: Exception in task 66.1 in stage 12.0 (TID 
2358)
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File 
"/mnt/yarn/usercache/hadoop/appcache/application_1588710822953_0001/container_1588710822953_0001_01_034974/pyspark.zip/pyspark/worker.py",
 line 377, in main
process()
  File 
"/mnt/yarn/usercache/hadoop/appcache/application_1588710822953_0001/container_1588710822953_0001_01_034974/pyspark.zip/pyspark/worker.py",
 line 372, in process
serializer.dump_stream(func(split_index, iterator), outfile)
  File 
"/mnt/yarn/usercache/hadoop/appcache/application_1588710822953_0001/container_1588710822953_0001_01_034974/pyspark.zip/pyspark/serializers.py",
 line 286, in dump_stream
for series in iterator:
  File 
"/mnt/yarn/usercache/hadoop/appcache/application_1588710822953_0001/container_1588710822953_0001_01_034974/pyspark.zip/pyspark/serializers.py",
 line 303, in load_stream
for batch in reader:
  File "pyarrow/ipc.pxi", line 266, in __iter__
  File "pyarrow/ipc.pxi", line 282, in 
pyarrow.lib._CRecordBatchReader.read_next_batch
  File "pyarrow/error.pxi", line 83, in pyarrow.lib.check_status
pyarrow.lib.ArrowIOError: read length must be positive or -1

Looking at this issue, it 
looks like PyArrow has a 2GB limit for each shard that is sent to the grouping 
function.

I'm currently running this job on 4 nodes with 16cores and 64GB of memory each.

I've attached the full error log here as well. What are some workarounds that I 
can do to get this job running? Unfortunately, we are running up to a 
production release and this is becoming a severe blocker.

Thanks,
Gautham






Re: Exception handling in Spark

2020-05-05 Thread Brandon Geise
Sure, just do case Failure(e) => throw e

 

From: Mich Talebzadeh 
Date: Tuesday, May 5, 2020 at 6:36 PM
To: Brandon Geise 
Cc: Todd Nist , "user @spark" 
Subject: Re: Exception handling in Spark

 

Hi Brandon.

 

In dealing with 

 

df case Failure(e) => throw new Exception("foo") 

 

Can one print the Exception message?

 

Thanks


Dr Mich Talebzadeh

 

LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

http://talebzadehmich.wordpress.com

 

Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction. 

 

 

 

On Tue, 5 May 2020 at 23:15, Mich Talebzadeh  wrote:

OK looking promising thanks

 

scala> import scala.util.{Try, Success, Failure}
import scala.util.{Try, Success, Failure}

scala> val df = 
Try(spark.read.format("com.databricks.spark.xml").option("rootTag", 
"hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml")) match 
{case Success(df) => df case Failure(e) => throw new Exception("foo")}
df: org.apache.spark.sql.DataFrame = [brand: string, ocis_party_id: bigint ... 
6 more fields]

 

regards,

 

Dr Mich Talebzadeh

 

LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

http://talebzadehmich.wordpress.com

 

Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction. 

 

 

 

On Tue, 5 May 2020 at 23:13, Brandon Geise  wrote:

Match needs to be lower case “match”

 

From: Mich Talebzadeh 
Date: Tuesday, May 5, 2020 at 6:13 PM
To: Brandon Geise 
Cc: Todd Nist , "user @spark" 
Subject: Re: Exception handling in Spark

 


scala> import scala.util.{Try, Success, Failure}

import scala.util.{Try, Success, Failure}

scala> val df = 
Try(spark.read.format("com.databricks.spark.xml").option("rootTag", 
"hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml")) Match 
{case Success(df) => df case Failure(e) => throw new Exception("foo")}
:48: error: value Match is not a member of 
scala.util.Try[org.apache.spark.sql.DataFrame]
   val df = 
Try(spark.read.format("com.databricks.spark.xml").option("rootTag", 
"hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml")) Match 
{case Success(df) => df case Failure(e) => throw new Exception("foo")}

 

Mich Talebzadeh

 

LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

http://talebzadehmich.wordpress.com

 

Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction. 

 

 

 

On Tue, 5 May 2020 at 23:10, Mich Talebzadeh  wrote:

This is what I get

 

scala> val df = 
Try(spark.read.format("com.databricks.spark.xml").option("rootTag", 
"hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml")) Match 
{case Success(df) => df case Failure(e) => throw new Exception("foo")}
:47: error: not found: value Try
   val df = 
Try(spark.read.format("com.databricks.spark.xml").option("rootTag", 
"hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml")) Match 
{case Success(df) => df case Failure(e) => throw new Exception("foo")}
^
:47: error: not found: value Success
   val df = 
Try(spark.read.format("com.databricks.spark.xml").option("rootTag", 
"hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml")) Match 
{case Success(df) => df case Failure(e) => throw new Exception("foo")}


 ^
:47: error: not found: value Failure
   val df = 
Try(spark.read.format("com.databricks.spark.xml").option("rootTag", 
"hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml")) Match 
{case Success(df) => df case Failure(e) => throw new Exception("foo")}

 

Dr Mich Talebzadeh

 

LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

http://talebzadehmich.wordpress.com

 

Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The 

Re: Exception handling in Spark

2020-05-05 Thread Mich Talebzadeh
Hi Brandon.

In dealing with

df case Failure(e) => throw new Exception("foo")

Can one print the Exception message?

Thanks

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Tue, 5 May 2020 at 23:15, Mich Talebzadeh 
wrote:

> OK looking promising thanks
>
> scala> import scala.util.{Try, Success, Failure}
> import scala.util.{Try, Success, Failure}
>
> scala> val df =
> Try(spark.read.format("com.databricks.spark.xml").option("rootTag",
> "hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml"))
> match {case Success(df) => df case Failure(e) => throw new Exception("foo")}
> df: org.apache.spark.sql.DataFrame = [brand: string, ocis_party_id: bigint
> ... 6 more fields]
>
>
> regards,
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Tue, 5 May 2020 at 23:13, Brandon Geise  wrote:
>
>> Match needs to be lower case “match”
>>
>>
>>
>> *From: *Mich Talebzadeh 
>> *Date: *Tuesday, May 5, 2020 at 6:13 PM
>> *To: *Brandon Geise 
>> *Cc: *Todd Nist , "user @spark" <
>> user@spark.apache.org>
>> *Subject: *Re: Exception handling in Spark
>>
>>
>>
>>
>> scala> import scala.util.{Try, Success, Failure}
>>
>> import scala.util.{Try, Success, Failure}
>>
>> scala> val df =
>> Try(spark.read.format("com.databricks.spark.xml").option("rootTag",
>> "hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml"))
>> Match {case Success(df) => df case Failure(e) => throw new Exception("foo")}
>> :48: error: value Match is not a member of
>> scala.util.Try[org.apache.spark.sql.DataFrame]
>>val df =
>> Try(spark.read.format("com.databricks.spark.xml").option("rootTag",
>> "hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml"))
>> Match {case Success(df) => df case Failure(e) => throw new Exception("foo")}
>>
>>
>>
>> Mich Talebzadeh
>>
>>
>>
>> LinkedIn  
>> *https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>>
>>
>>
>> On Tue, 5 May 2020 at 23:10, Mich Talebzadeh 
>> wrote:
>>
>> This is what I get
>>
>>
>>
>> scala> val df =
>> Try(spark.read.format("com.databricks.spark.xml").option("rootTag",
>> "hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml"))
>> Match {case Success(df) => df case Failure(e) => throw new Exception("foo")}
>> :47: error: not found: value Try
>>val df =
>> Try(spark.read.format("com.databricks.spark.xml").option("rootTag",
>> "hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml"))
>> Match {case Success(df) => df case Failure(e) => throw new Exception("foo")}
>> ^
>> :47: error: not found: value Success
>>val df =
>> Try(spark.read.format("com.databricks.spark.xml").option("rootTag",
>> "hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml"))
>> Match {case Success(df) => df case Failure(e) => throw new Exception("foo")}
>>
>>
>>^
>> :47: error: not found: value Failure
>>val df =
>> Try(spark.read.format("com.databricks.spark.xml").option("rootTag",
>> "hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml"))
>> Match {case Success(df) => df case Failure(e) => throw new Exception("foo")}
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn  
>> *https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> 

Re: Exception handling in Spark

2020-05-05 Thread Mich Talebzadeh
OK looking promising thanks

scala> import scala.util.{Try, Success, Failure}
import scala.util.{Try, Success, Failure}

scala> val df =
Try(spark.read.format("com.databricks.spark.xml").option("rootTag",
"hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml"))
match {case Success(df) => df case Failure(e) => throw new Exception("foo")}
df: org.apache.spark.sql.DataFrame = [brand: string, ocis_party_id: bigint
... 6 more fields]


regards,


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Tue, 5 May 2020 at 23:13, Brandon Geise  wrote:

> Match needs to be lower case “match”
>
>
>
> *From: *Mich Talebzadeh 
> *Date: *Tuesday, May 5, 2020 at 6:13 PM
> *To: *Brandon Geise 
> *Cc: *Todd Nist , "user @spark"  >
> *Subject: *Re: Exception handling in Spark
>
>
>
>
> scala> import scala.util.{Try, Success, Failure}
>
> import scala.util.{Try, Success, Failure}
>
> scala> val df =
> Try(spark.read.format("com.databricks.spark.xml").option("rootTag",
> "hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml"))
> Match {case Success(df) => df case Failure(e) => throw new Exception("foo")}
> :48: error: value Match is not a member of
> scala.util.Try[org.apache.spark.sql.DataFrame]
>val df =
> Try(spark.read.format("com.databricks.spark.xml").option("rootTag",
> "hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml"))
> Match {case Success(df) => df case Failure(e) => throw new Exception("foo")}
>
>
>
> Mich Talebzadeh
>
>
>
> LinkedIn  
> *https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
>
>
>
> On Tue, 5 May 2020 at 23:10, Mich Talebzadeh 
> wrote:
>
> This is what I get
>
>
>
> scala> val df =
> Try(spark.read.format("com.databricks.spark.xml").option("rootTag",
> "hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml"))
> Match {case Success(df) => df case Failure(e) => throw new Exception("foo")}
> :47: error: not found: value Try
>val df =
> Try(spark.read.format("com.databricks.spark.xml").option("rootTag",
> "hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml"))
> Match {case Success(df) => df case Failure(e) => throw new Exception("foo")}
> ^
> :47: error: not found: value Success
>val df =
> Try(spark.read.format("com.databricks.spark.xml").option("rootTag",
> "hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml"))
> Match {case Success(df) => df case Failure(e) => throw new Exception("foo")}
>
>
>^
> :47: error: not found: value Failure
>val df =
> Try(spark.read.format("com.databricks.spark.xml").option("rootTag",
> "hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml"))
> Match {case Success(df) => df case Failure(e) => throw new Exception("foo")}
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn  
> *https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
>
>
>
> On Tue, 5 May 2020 at 23:03, Brandon Geise  wrote:
>
> This is what I had in mind.  Can you give this approach a try?
>
>
>
> val df = Try(spark.read.csv("")) match {
>
>   case Success(df) => df
>
>   case Failure(e) => throw new Exception("foo")
>
>   }
>
>
>
> *From: *Mich Talebzadeh 
> *Date: *Tuesday, May 5, 2020 at 5:17 PM
> *To: *Todd Nist 
> *Cc: *Brandon Geise , "user @spark" <
> user@spark.apache.org>
> *Subject: *Re: Exception handling in Spark
>
>
>
> I am trying this approach
>
>
>
>
> val broadcastValue = "123456789"  // 

Re: Exception handling in Spark

2020-05-05 Thread Brandon Geise
Match needs to be lower case “match”

 

From: Mich Talebzadeh 
Date: Tuesday, May 5, 2020 at 6:13 PM
To: Brandon Geise 
Cc: Todd Nist , "user @spark" 
Subject: Re: Exception handling in Spark

 


scala> import scala.util.{Try, Success, Failure}

import scala.util.{Try, Success, Failure}

scala> val df = 
Try(spark.read.format("com.databricks.spark.xml").option("rootTag", 
"hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml")) Match 
{case Success(df) => df case Failure(e) => throw new Exception("foo")}
:48: error: value Match is not a member of 
scala.util.Try[org.apache.spark.sql.DataFrame]
   val df = 
Try(spark.read.format("com.databricks.spark.xml").option("rootTag", 
"hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml")) Match 
{case Success(df) => df case Failure(e) => throw new Exception("foo")}

 

Mich Talebzadeh

 

LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

http://talebzadehmich.wordpress.com

 

Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction. 

 

 

 

On Tue, 5 May 2020 at 23:10, Mich Talebzadeh  wrote:

This is what I get

 

scala> val df = 
Try(spark.read.format("com.databricks.spark.xml").option("rootTag", 
"hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml")) Match 
{case Success(df) => df case Failure(e) => throw new Exception("foo")}
:47: error: not found: value Try
   val df = 
Try(spark.read.format("com.databricks.spark.xml").option("rootTag", 
"hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml")) Match 
{case Success(df) => df case Failure(e) => throw new Exception("foo")}
^
:47: error: not found: value Success
   val df = 
Try(spark.read.format("com.databricks.spark.xml").option("rootTag", 
"hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml")) Match 
{case Success(df) => df case Failure(e) => throw new Exception("foo")}


 ^
:47: error: not found: value Failure
   val df = 
Try(spark.read.format("com.databricks.spark.xml").option("rootTag", 
"hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml")) Match 
{case Success(df) => df case Failure(e) => throw new Exception("foo")}

 

Dr Mich Talebzadeh

 

LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

http://talebzadehmich.wordpress.com

 

Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction. 

 

 

 

On Tue, 5 May 2020 at 23:03, Brandon Geise  wrote:

This is what I had in mind.  Can you give this approach a try?

 

val df = Try(spark.read.csv("")) match {

  case Success(df) => df

  case Failure(e) => throw new Exception("foo")

  }

 

From: Mich Talebzadeh 
Date: Tuesday, May 5, 2020 at 5:17 PM
To: Todd Nist 
Cc: Brandon Geise , "user @spark" 

Subject: Re: Exception handling in Spark

 

I am trying this approach

 


val broadcastValue = "123456789"  // I assume this will be sent as a constant 
for the batch
// Create a DF on top of XML
try {
  val df = spark.read.
format("com.databricks.spark.xml").
option("rootTag", "hierarchy").
option("rowTag", "sms_request").
load("/tmp/broadcast.xml")
  df
}  catch {
case ex: FileNotFoundException => {
println (s"\nFile /tmp/broadcast.xml not found\n")
None
}
   case unknown: Exception => {
 println(s"\n Error encountered $unknown\n")
None
}
}

val newDF = df.withColumn("broadcastid", lit(broadcastValue))

But this does not work

 

scala> try {
 |   val df = spark.read.
 | format("com.databricks.spark.xml").
 | option("rootTag", "hierarchy").
 | option("rowTag", "sms_request").
 | load("/tmp/broadcast.xml")
 |   Some(df)
 | }  catch {
 | case ex: FileNotFoundException => {
 | println (s"\nFile /tmp/broadcast.xml not found\n")
 | None
 | }
 |case unknown: Exception => {
 |  println(s"\n Error encountered $unknown\n")
 | None
 | }
 | }
res6: Option[org.apache.spark.sql.DataFrame] = 

Re: Exception handling in Spark

2020-05-05 Thread Mich Talebzadeh
scala> import scala.util.{Try, Success, Failure}
import scala.util.{Try, Success, Failure}

scala> val df =
Try(spark.read.format("com.databricks.spark.xml").option("rootTag",
"hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml"))
Match {case Success(df) => df case Failure(e) => throw new Exception("foo")}
:48: error: value Match is not a member of
scala.util.Try[org.apache.spark.sql.DataFrame]
   val df =
Try(spark.read.format("com.databricks.spark.xml").option("rootTag",
"hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml"))
Match {case Success(df) => df case Failure(e) => throw new Exception("foo")}


Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Tue, 5 May 2020 at 23:10, Mich Talebzadeh 
wrote:

> This is what I get
>
> scala> val df =
> Try(spark.read.format("com.databricks.spark.xml").option("rootTag",
> "hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml"))
> Match {case Success(df) => df case Failure(e) => throw new Exception("foo")}
> :47: error: not found: value Try
>val df =
> Try(spark.read.format("com.databricks.spark.xml").option("rootTag",
> "hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml"))
> Match {case Success(df) => df case Failure(e) => throw new Exception("foo")}
> ^
> :47: error: not found: value Success
>val df =
> Try(spark.read.format("com.databricks.spark.xml").option("rootTag",
> "hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml"))
> Match {case Success(df) => df case Failure(e) => throw new Exception("foo")}
>
>
>^
> :47: error: not found: value Failure
>val df =
> Try(spark.read.format("com.databricks.spark.xml").option("rootTag",
> "hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml"))
> Match {case Success(df) => df case Failure(e) => throw new Exception("foo")}
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Tue, 5 May 2020 at 23:03, Brandon Geise  wrote:
>
>> This is what I had in mind.  Can you give this approach a try?
>>
>>
>>
>> val df = Try(spark.read.csv("")) match {
>>
>>   case Success(df) => df
>>
>>   case Failure(e) => throw new Exception("foo")
>>
>>   }
>>
>>
>>
>> *From: *Mich Talebzadeh 
>> *Date: *Tuesday, May 5, 2020 at 5:17 PM
>> *To: *Todd Nist 
>> *Cc: *Brandon Geise , "user @spark" <
>> user@spark.apache.org>
>> *Subject: *Re: Exception handling in Spark
>>
>>
>>
>> I am trying this approach
>>
>>
>>
>>
>> val broadcastValue = "123456789"  // I assume this will be sent as a
>> constant for the batch
>> // Create a DF on top of XML
>> try {
>>   val df = spark.read.
>> format("com.databricks.spark.xml").
>> option("rootTag", "hierarchy").
>> option("rowTag", "sms_request").
>> load("/tmp/broadcast.xml")
>>   df
>> }  catch {
>> case ex: FileNotFoundException => {
>> println (s"\nFile /tmp/broadcast.xml not found\n")
>> None
>> }
>>case unknown: Exception => {
>>  println(s"\n Error encountered $unknown\n")
>> None
>> }
>> }
>>
>> val newDF = df.withColumn("broadcastid", lit(broadcastValue))
>>
>> But this does not work
>>
>>
>>
>> scala> try {
>>  |   val df = spark.read.
>>  | format("com.databricks.spark.xml").
>>  | option("rootTag", "hierarchy").
>>  | option("rowTag", "sms_request").
>>  | load("/tmp/broadcast.xml")
>>  |   Some(df)
>>  | }  catch {
>>  | case ex: FileNotFoundException => {
>>  | println (s"\nFile /tmp/broadcast.xml not found\n")
>>  | None
>>  | }
>>  |case unknown: Exception => {
>>  |  println(s"\n Error encountered $unknown\n")
>>  | None
>>  | 

Re: Exception handling in Spark

2020-05-05 Thread Brandon Geise
Import scala.util.Try

Import scala.util.Success

Import scala.util.Failure

 

From: Mich Talebzadeh 
Date: Tuesday, May 5, 2020 at 6:11 PM
To: Brandon Geise 
Cc: Todd Nist , "user @spark" 
Subject: Re: Exception handling in Spark

 

This is what I get

 

scala> val df = 
Try(spark.read.format("com.databricks.spark.xml").option("rootTag", 
"hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml")) Match 
{case Success(df) => df case Failure(e) => throw new Exception("foo")}
:47: error: not found: value Try
   val df = 
Try(spark.read.format("com.databricks.spark.xml").option("rootTag", 
"hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml")) Match 
{case Success(df) => df case Failure(e) => throw new Exception("foo")}
^
:47: error: not found: value Success
   val df = 
Try(spark.read.format("com.databricks.spark.xml").option("rootTag", 
"hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml")) Match 
{case Success(df) => df case Failure(e) => throw new Exception("foo")}


 ^
:47: error: not found: value Failure
   val df = 
Try(spark.read.format("com.databricks.spark.xml").option("rootTag", 
"hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml")) Match 
{case Success(df) => df case Failure(e) => throw new Exception("foo")}

 

Dr Mich Talebzadeh

 

LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

http://talebzadehmich.wordpress.com

 

Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction. 

 

 

 

On Tue, 5 May 2020 at 23:03, Brandon Geise  wrote:

This is what I had in mind.  Can you give this approach a try?

 

val df = Try(spark.read.csv("")) match {

  case Success(df) => df

  case Failure(e) => throw new Exception("foo")

  }

 

From: Mich Talebzadeh 
Date: Tuesday, May 5, 2020 at 5:17 PM
To: Todd Nist 
Cc: Brandon Geise , "user @spark" 

Subject: Re: Exception handling in Spark

 

I am trying this approach

 


val broadcastValue = "123456789"  // I assume this will be sent as a constant 
for the batch
// Create a DF on top of XML
try {
  val df = spark.read.
format("com.databricks.spark.xml").
option("rootTag", "hierarchy").
option("rowTag", "sms_request").
load("/tmp/broadcast.xml")
  df
}  catch {
case ex: FileNotFoundException => {
println (s"\nFile /tmp/broadcast.xml not found\n")
None
}
   case unknown: Exception => {
 println(s"\n Error encountered $unknown\n")
None
}
}

val newDF = df.withColumn("broadcastid", lit(broadcastValue))

But this does not work

 

scala> try {
 |   val df = spark.read.
 | format("com.databricks.spark.xml").
 | option("rootTag", "hierarchy").
 | option("rowTag", "sms_request").
 | load("/tmp/broadcast.xml")
 |   Some(df)
 | }  catch {
 | case ex: FileNotFoundException => {
 | println (s"\nFile /tmp/broadcast.xml not found\n")
 | None
 | }
 |case unknown: Exception => {
 |  println(s"\n Error encountered $unknown\n")
 | None
 | }
 | }
res6: Option[org.apache.spark.sql.DataFrame] = Some([brand: string, 
ocis_party_id: bigint ... 6 more fields])

scala>

scala> df.printSchema
:48: error: not found: value df
   df.printSchema
   
data frame seems to be lost!

 

Thanks,

 

Dr Mich Talebzadeh

 

LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

http://talebzadehmich.wordpress.com

 

Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction. 

 

 

 

On Tue, 5 May 2020 at 18:08, Mich Talebzadeh  wrote:

Thanks Todd. This is what I did before creating DF on top of that file

 

var exists = true

exists = xmlDirExists(broadcastStagingConfig.xmlFilePath)

if(!exists) {

  println(s"\n Error: The xml file ${ broadcastStagingConfig.xmlFilePath} does 
not exist, aborting!\n")

 sys.exit(1)

}

.

.

def xmlFileExists(hdfsDirectory: String): Boolean = {

   val hadoopConf = new org.apache.hadoop.conf.Configuration()

   val fs 

Re: Exception handling in Spark

2020-05-05 Thread Mich Talebzadeh
This is what I get

scala> val df =
Try(spark.read.format("com.databricks.spark.xml").option("rootTag",
"hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml"))
Match {case Success(df) => df case Failure(e) => throw new Exception("foo")}
:47: error: not found: value Try
   val df =
Try(spark.read.format("com.databricks.spark.xml").option("rootTag",
"hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml"))
Match {case Success(df) => df case Failure(e) => throw new Exception("foo")}
^
:47: error: not found: value Success
   val df =
Try(spark.read.format("com.databricks.spark.xml").option("rootTag",
"hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml"))
Match {case Success(df) => df case Failure(e) => throw new Exception("foo")}


 ^
:47: error: not found: value Failure
   val df =
Try(spark.read.format("com.databricks.spark.xml").option("rootTag",
"hierarchy").option("rowTag", "sms_request").load("/tmp/broadcast.xml"))
Match {case Success(df) => df case Failure(e) => throw new Exception("foo")}

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Tue, 5 May 2020 at 23:03, Brandon Geise  wrote:

> This is what I had in mind.  Can you give this approach a try?
>
>
>
> val df = Try(spark.read.csv("")) match {
>
>   case Success(df) => df
>
>   case Failure(e) => throw new Exception("foo")
>
>   }
>
>
>
> *From: *Mich Talebzadeh 
> *Date: *Tuesday, May 5, 2020 at 5:17 PM
> *To: *Todd Nist 
> *Cc: *Brandon Geise , "user @spark" <
> user@spark.apache.org>
> *Subject: *Re: Exception handling in Spark
>
>
>
> I am trying this approach
>
>
>
>
> val broadcastValue = "123456789"  // I assume this will be sent as a
> constant for the batch
> // Create a DF on top of XML
> try {
>   val df = spark.read.
> format("com.databricks.spark.xml").
> option("rootTag", "hierarchy").
> option("rowTag", "sms_request").
> load("/tmp/broadcast.xml")
>   df
> }  catch {
> case ex: FileNotFoundException => {
> println (s"\nFile /tmp/broadcast.xml not found\n")
> None
> }
>case unknown: Exception => {
>  println(s"\n Error encountered $unknown\n")
> None
> }
> }
>
> val newDF = df.withColumn("broadcastid", lit(broadcastValue))
>
> But this does not work
>
>
>
> scala> try {
>  |   val df = spark.read.
>  | format("com.databricks.spark.xml").
>  | option("rootTag", "hierarchy").
>  | option("rowTag", "sms_request").
>  | load("/tmp/broadcast.xml")
>  |   Some(df)
>  | }  catch {
>  | case ex: FileNotFoundException => {
>  | println (s"\nFile /tmp/broadcast.xml not found\n")
>  | None
>  | }
>  |case unknown: Exception => {
>  |  println(s"\n Error encountered $unknown\n")
>  | None
>  | }
>  | }
> res6: Option[org.apache.spark.sql.DataFrame] = Some([brand: string,
> ocis_party_id: bigint ... 6 more fields])
>
> scala>
>
> scala> df.printSchema
> :48: error: not found: value df
>df.printSchema
>
> data frame seems to be lost!
>
>
>
> Thanks,
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn  
> *https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
>
>
>
> On Tue, 5 May 2020 at 18:08, Mich Talebzadeh 
> wrote:
>
> Thanks Todd. This is what I did before creating DF on top of that file
>
>
>
> var exists = true
>
> exists = xmlDirExists(broadcastStagingConfig.xmlFilePath)
>
> if(!exists) {
>
>   println(s"\n Error: The xml file ${ broadcastStagingConfig.xmlFilePath}
> does not exist, aborting!\n")
>
>  sys.exit(1)
>
> }
>
> .
>
> .
>
> def xmlFileExists(hdfsDirectory: String): Boolean = {
>
>val hadoopConf = new org.apache.hadoop.conf.Configuration()
>
>val fs = 

Re: Exception handling in Spark

2020-05-05 Thread Brandon Geise
This is what I had in mind.  Can you give this approach a try?

 

val df = Try(spark.read.csv("")) match {

  case Success(df) => df

  case Failure(e) => throw new Exception("foo")

  }

 

From: Mich Talebzadeh 
Date: Tuesday, May 5, 2020 at 5:17 PM
To: Todd Nist 
Cc: Brandon Geise , "user @spark" 

Subject: Re: Exception handling in Spark

 

I am trying this approach

 


val broadcastValue = "123456789"  // I assume this will be sent as a constant 
for the batch
// Create a DF on top of XML
try {
  val df = spark.read.
format("com.databricks.spark.xml").
option("rootTag", "hierarchy").
option("rowTag", "sms_request").
load("/tmp/broadcast.xml")
  df
}  catch {
case ex: FileNotFoundException => {
println (s"\nFile /tmp/broadcast.xml not found\n")
None
}
   case unknown: Exception => {
 println(s"\n Error encountered $unknown\n")
None
}
}

val newDF = df.withColumn("broadcastid", lit(broadcastValue))

But this does not work

 

scala> try {
 |   val df = spark.read.
 | format("com.databricks.spark.xml").
 | option("rootTag", "hierarchy").
 | option("rowTag", "sms_request").
 | load("/tmp/broadcast.xml")
 |   Some(df)
 | }  catch {
 | case ex: FileNotFoundException => {
 | println (s"\nFile /tmp/broadcast.xml not found\n")
 | None
 | }
 |case unknown: Exception => {
 |  println(s"\n Error encountered $unknown\n")
 | None
 | }
 | }
res6: Option[org.apache.spark.sql.DataFrame] = Some([brand: string, 
ocis_party_id: bigint ... 6 more fields])

scala>

scala> df.printSchema
:48: error: not found: value df
   df.printSchema
   
data frame seems to be lost!

 

Thanks,

 

Dr Mich Talebzadeh

 

LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

http://talebzadehmich.wordpress.com

 

Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction. 

 

 

 

On Tue, 5 May 2020 at 18:08, Mich Talebzadeh  wrote:

Thanks Todd. This is what I did before creating DF on top of that file

 

var exists = true

exists = xmlDirExists(broadcastStagingConfig.xmlFilePath)

if(!exists) {

  println(s"\n Error: The xml file ${ broadcastStagingConfig.xmlFilePath} does 
not exist, aborting!\n")

 sys.exit(1)

}

.

.

def xmlFileExists(hdfsDirectory: String): Boolean = {

   val hadoopConf = new org.apache.hadoop.conf.Configuration()

   val fs = org.apache.hadoop.fs.FileSystem.get(hadoopConf)

   fs.exists(new org.apache.hadoop.fs.Path(hdfsDirectory))

 }

 

And checked it. It works.

 

Dr Mich Talebzadeh

 

LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

http://talebzadehmich.wordpress.com

 

Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction. 

 

 

 

On Tue, 5 May 2020 at 17:54, Todd Nist  wrote:

Could you do something like this prior to calling the action.

 

// Create FileSystem object from Hadoop Configuration

val fs = FileSystem.get(spark.sparkContext.hadoopConfiguration)

// This methods returns Boolean (true - if file exists, false - if file doesn't 
exist

val fileExists = fs.exists(new Path(""))

if (fileExists) println("File exists!")

else println("File doesn't exist!")

 

Not sure that will help you or not, just a thought.

 

-Todd

 

 

 

 

On Tue, May 5, 2020 at 11:45 AM Mich Talebzadeh  
wrote:

Thanks  Brandon!

 

i should have remembered that.

 

basically the code gets out with sys.exit(1)  if it cannot find the file

 

I guess there is no easy way of validating DF except actioning it by show(1,0) 
etc and checking if it works?

 

Regards,


Dr Mich Talebzadeh

 

LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

http://talebzadehmich.wordpress.com

 

Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction. 

 

 

 

On Tue, 5 May 2020 at 16:41, Brandon Geise  wrote:

You could use the 

Re: Exception handling in Spark

2020-05-05 Thread Mich Talebzadeh
I am trying this approach


val broadcastValue = "123456789"  // I assume this will be sent as a
constant for the batch
// Create a DF on top of XML
try {
  val df = spark.read.
format("com.databricks.spark.xml").
option("rootTag", "hierarchy").
option("rowTag", "sms_request").
load("/tmp/broadcast.xml")
  df
}  catch {
case ex: FileNotFoundException => {
println (s"\nFile /tmp/broadcast.xml not found\n")
None
}
   case unknown: Exception => {
 println(s"\n Error encountered $unknown\n")
None
}
}

val newDF = df.withColumn("broadcastid", lit(broadcastValue))

But this does not work

scala> try {
 |   val df = spark.read.
 | format("com.databricks.spark.xml").
 | option("rootTag", "hierarchy").
 | option("rowTag", "sms_request").
 | load("/tmp/broadcast.xml")
 |   Some(df)
 | }  catch {
 | case ex: FileNotFoundException => {
 | println (s"\nFile /tmp/broadcast.xml not found\n")
 | None
 | }
 |case unknown: Exception => {
 |  println(s"\n Error encountered $unknown\n")
 | None
 | }
 | }
res6: Option[org.apache.spark.sql.DataFrame] = Some([brand: string,
ocis_party_id: bigint ... 6 more fields])

scala>

scala> df.printSchema
:48: error: not found: value df
   df.printSchema

data frame seems to be lost!

Thanks,


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Tue, 5 May 2020 at 18:08, Mich Talebzadeh 
wrote:

> Thanks Todd. This is what I did before creating DF on top of that file
>
> var exists = true
> exists = xmlDirExists(broadcastStagingConfig.xmlFilePath)
> if(!exists) {
>   println(s"\n Error: The xml file ${ broadcastStagingConfig.xmlFilePath}
> does not exist, aborting!\n")
>  sys.exit(1)
> }
> .
> .
> def xmlFileExists(hdfsDirectory: String): Boolean = {
>val hadoopConf = new org.apache.hadoop.conf.Configuration()
>val fs = org.apache.hadoop.fs.FileSystem.get(hadoopConf)
>fs.exists(new org.apache.hadoop.fs.Path(hdfsDirectory))
>  }
>
> And checked it. It works.
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Tue, 5 May 2020 at 17:54, Todd Nist  wrote:
>
>> Could you do something like this prior to calling the action.
>>
>> // Create FileSystem object from Hadoop Configuration
>> val fs = FileSystem.get(spark.sparkContext.hadoopConfiguration)
>> // This methods returns Boolean (true - if file exists, false - if file
>> doesn't exist
>> val fileExists = fs.exists(new Path(""))
>> if (fileExists) println("File exists!")
>> else println("File doesn't exist!")
>>
>> Not sure that will help you or not, just a thought.
>>
>> -Todd
>>
>>
>>
>>
>> On Tue, May 5, 2020 at 11:45 AM Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> Thanks  Brandon!
>>>
>>> i should have remembered that.
>>>
>>> basically the code gets out with sys.exit(1)  if it cannot find the file
>>>
>>> I guess there is no easy way of validating DF except actioning it by
>>> show(1,0) etc and checking if it works?
>>>
>>> Regards,
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> *
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Tue, 5 May 2020 at 16:41, 

Pyspark and snowflake Column Mapping

2020-05-05 Thread anbutech
Hi Team,

While working on the json data and we flattened the unstrucured data into
structured format.so here we are having spark data types like
Array> fields and Array data type columns
in the databricks delta table.

while loading the data from  databricks spark connector to snowflake we
noticed that the Array> and Array columns mapped to
variant type in snowflake.actually we are expecting as same array type in
snowflake.

how do we handle this case while loading into snowflake.

please share your ideas.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Exception handling in Spark

2020-05-05 Thread Mich Talebzadeh
Thanks Todd. This is what I did before creating DF on top of that file

var exists = true
exists = xmlDirExists(broadcastStagingConfig.xmlFilePath)
if(!exists) {
  println(s"\n Error: The xml file ${ broadcastStagingConfig.xmlFilePath}
does not exist, aborting!\n")
 sys.exit(1)
}
.
.
def xmlFileExists(hdfsDirectory: String): Boolean = {
   val hadoopConf = new org.apache.hadoop.conf.Configuration()
   val fs = org.apache.hadoop.fs.FileSystem.get(hadoopConf)
   fs.exists(new org.apache.hadoop.fs.Path(hdfsDirectory))
 }

And checked it. It works.

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Tue, 5 May 2020 at 17:54, Todd Nist  wrote:

> Could you do something like this prior to calling the action.
>
> // Create FileSystem object from Hadoop Configuration
> val fs = FileSystem.get(spark.sparkContext.hadoopConfiguration)
> // This methods returns Boolean (true - if file exists, false - if file
> doesn't exist
> val fileExists = fs.exists(new Path(""))
> if (fileExists) println("File exists!")
> else println("File doesn't exist!")
>
> Not sure that will help you or not, just a thought.
>
> -Todd
>
>
>
>
> On Tue, May 5, 2020 at 11:45 AM Mich Talebzadeh 
> wrote:
>
>> Thanks  Brandon!
>>
>> i should have remembered that.
>>
>> basically the code gets out with sys.exit(1)  if it cannot find the file
>>
>> I guess there is no easy way of validating DF except actioning it by
>> show(1,0) etc and checking if it works?
>>
>> Regards,
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Tue, 5 May 2020 at 16:41, Brandon Geise 
>> wrote:
>>
>>> You could use the Hadoop API and check if the file exists.
>>>
>>>
>>>
>>> *From: *Mich Talebzadeh 
>>> *Date: *Tuesday, May 5, 2020 at 11:25 AM
>>> *To: *"user @spark" 
>>> *Subject: *Exception handling in Spark
>>>
>>>
>>>
>>> Hi,
>>>
>>>
>>>
>>> As I understand exception handling in Spark only makes sense if one
>>> attempts an action as opposed to lazy transformations?
>>>
>>>
>>>
>>> Let us assume that I am reading an XML file from the HDFS directory  and
>>> create a dataframe DF on it
>>>
>>>
>>>
>>> val broadcastValue = "123456789"  // I assume this will be sent as a
>>> constant for the batch
>>>
>>> // Create a DF on top of XML
>>> val df = spark.read.
>>> format("com.databricks.spark.xml").
>>> option("rootTag", "hierarchy").
>>> option("rowTag", "sms_request").
>>> load("/tmp/broadcast.xml")
>>>
>>> val newDF = df.withColumn("broadcastid", lit(broadcastValue))
>>>
>>> newDF.createOrReplaceTempView("tmp")
>>>
>>>   // Put data in Hive table
>>>   //
>>>   sqltext = """
>>>   INSERT INTO TABLE michtest.BroadcastStaging PARTITION
>>> (broadcastid="123456", brand)
>>>   SELECT
>>>   ocis_party_id AS partyId
>>> , target_mobile_no AS phoneNumber
>>> , brand
>>> , broadcastid
>>>   FROM tmp
>>>   """
>>> //
>>>
>>> // Here I am performing a collection
>>>
>>> try  {
>>>
>>>  spark.sql(sqltext)
>>>
>>> } catch {
>>>
>>> case e: SQLException => e.printStackTrace
>>>
>>> sys.exit()
>>>
>>> }
>>>
>>>
>>>
>>> Now the issue I have is that what if the xml file  /tmp/broadcast.xml
>>> does not exist or deleted? I won't be able to catch the error until the
>>> hive table is populated. Of course I can write a shell script to check if
>>> the file exist before running the job or put small collection like
>>> df.show(1,0). Are there more general alternatives?
>>>
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn  
>>> *https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> *
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or 

Re: Exception handling in Spark

2020-05-05 Thread Brandon Geise
Read is an action, so you could wrap it in a Try (or whatever you want)

scala> val df = Try(spark.read.csv("test"))

df: scala.util.Try[org.apache.spark.sql.DataFrame] = 
Failure(org.apache.spark.sql.AnalysisException: Path does not exist: 
file:/test;)

 

From: Mich Talebzadeh 
Date: Tuesday, May 5, 2020 at 12:45 PM
To: Brandon Geise 
Cc: "user @spark" 
Subject: Re: Exception handling in Spark

 

Thanks  Brandon!

 

i should have remembered that.

 

basically the code gets out with sys.exit(1)  if it cannot find the file

 

I guess there is no easy way of validating DF except actioning it by show(1,0) 
etc and checking if it works?

 

Regards,


Dr Mich Talebzadeh

 

LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

http://talebzadehmich.wordpress.com

 

Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction. 

 

 

 

On Tue, 5 May 2020 at 16:41, Brandon Geise  wrote:

You could use the Hadoop API and check if the file exists.

 

From: Mich Talebzadeh 
Date: Tuesday, May 5, 2020 at 11:25 AM
To: "user @spark" 
Subject: Exception handling in Spark

 

Hi,

 

As I understand exception handling in Spark only makes sense if one attempts an 
action as opposed to lazy transformations?

 

Let us assume that I am reading an XML file from the HDFS directory  and create 
a dataframe DF on it

 

val broadcastValue = "123456789"  // I assume this will be sent as a constant 
for the batch

// Create a DF on top of XML
val df = spark.read.
format("com.databricks.spark.xml").
option("rootTag", "hierarchy").
option("rowTag", "sms_request").
load("/tmp/broadcast.xml")

val newDF = df.withColumn("broadcastid", lit(broadcastValue))

newDF.createOrReplaceTempView("tmp")

  // Put data in Hive table
  //
  sqltext = """
  INSERT INTO TABLE michtest.BroadcastStaging PARTITION (broadcastid="123456", 
brand)
  SELECT
  ocis_party_id AS partyId
, target_mobile_no AS phoneNumber
, brand
, broadcastid
  FROM tmp
  """
//

// Here I am performing a collection 

try  {

 spark.sql(sqltext)

} catch {

case e: SQLException => e.printStackTrace

sys.exit()

}

 

Now the issue I have is that what if the xml file  /tmp/broadcast.xml does not 
exist or deleted? I won't be able to catch the error until the hive table is 
populated. Of course I can write a shell script to check if the file exist 
before running the job or put small collection like df.show(1,0). Are there 
more general alternatives?

 

Thanks

 

Dr Mich Talebzadeh

 

LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

http://talebzadehmich.wordpress.com

 

Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction. 

 



Re: Exception handling in Spark

2020-05-05 Thread Todd Nist
Could you do something like this prior to calling the action.

// Create FileSystem object from Hadoop Configuration
val fs = FileSystem.get(spark.sparkContext.hadoopConfiguration)
// This methods returns Boolean (true - if file exists, false - if file
doesn't exist
val fileExists = fs.exists(new Path(""))
if (fileExists) println("File exists!")
else println("File doesn't exist!")

Not sure that will help you or not, just a thought.

-Todd




On Tue, May 5, 2020 at 11:45 AM Mich Talebzadeh 
wrote:

> Thanks  Brandon!
>
> i should have remembered that.
>
> basically the code gets out with sys.exit(1)  if it cannot find the file
>
> I guess there is no easy way of validating DF except actioning it by
> show(1,0) etc and checking if it works?
>
> Regards,
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Tue, 5 May 2020 at 16:41, Brandon Geise  wrote:
>
>> You could use the Hadoop API and check if the file exists.
>>
>>
>>
>> *From: *Mich Talebzadeh 
>> *Date: *Tuesday, May 5, 2020 at 11:25 AM
>> *To: *"user @spark" 
>> *Subject: *Exception handling in Spark
>>
>>
>>
>> Hi,
>>
>>
>>
>> As I understand exception handling in Spark only makes sense if one
>> attempts an action as opposed to lazy transformations?
>>
>>
>>
>> Let us assume that I am reading an XML file from the HDFS directory  and
>> create a dataframe DF on it
>>
>>
>>
>> val broadcastValue = "123456789"  // I assume this will be sent as a
>> constant for the batch
>>
>> // Create a DF on top of XML
>> val df = spark.read.
>> format("com.databricks.spark.xml").
>> option("rootTag", "hierarchy").
>> option("rowTag", "sms_request").
>> load("/tmp/broadcast.xml")
>>
>> val newDF = df.withColumn("broadcastid", lit(broadcastValue))
>>
>> newDF.createOrReplaceTempView("tmp")
>>
>>   // Put data in Hive table
>>   //
>>   sqltext = """
>>   INSERT INTO TABLE michtest.BroadcastStaging PARTITION
>> (broadcastid="123456", brand)
>>   SELECT
>>   ocis_party_id AS partyId
>> , target_mobile_no AS phoneNumber
>> , brand
>> , broadcastid
>>   FROM tmp
>>   """
>> //
>>
>> // Here I am performing a collection
>>
>> try  {
>>
>>  spark.sql(sqltext)
>>
>> } catch {
>>
>> case e: SQLException => e.printStackTrace
>>
>> sys.exit()
>>
>> }
>>
>>
>>
>> Now the issue I have is that what if the xml file  /tmp/broadcast.xml
>> does not exist or deleted? I won't be able to catch the error until the
>> hive table is populated. Of course I can write a shell script to check if
>> the file exist before running the job or put small collection like
>> df.show(1,0). Are there more general alternatives?
>>
>>
>>
>> Thanks
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn  
>> *https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>


Re: Exception handling in Spark

2020-05-05 Thread Mich Talebzadeh
Thanks  Brandon!

i should have remembered that.

basically the code gets out with sys.exit(1)  if it cannot find the file

I guess there is no easy way of validating DF except actioning it by
show(1,0) etc and checking if it works?

Regards,

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Tue, 5 May 2020 at 16:41, Brandon Geise  wrote:

> You could use the Hadoop API and check if the file exists.
>
>
>
> *From: *Mich Talebzadeh 
> *Date: *Tuesday, May 5, 2020 at 11:25 AM
> *To: *"user @spark" 
> *Subject: *Exception handling in Spark
>
>
>
> Hi,
>
>
>
> As I understand exception handling in Spark only makes sense if one
> attempts an action as opposed to lazy transformations?
>
>
>
> Let us assume that I am reading an XML file from the HDFS directory  and
> create a dataframe DF on it
>
>
>
> val broadcastValue = "123456789"  // I assume this will be sent as a
> constant for the batch
>
> // Create a DF on top of XML
> val df = spark.read.
> format("com.databricks.spark.xml").
> option("rootTag", "hierarchy").
> option("rowTag", "sms_request").
> load("/tmp/broadcast.xml")
>
> val newDF = df.withColumn("broadcastid", lit(broadcastValue))
>
> newDF.createOrReplaceTempView("tmp")
>
>   // Put data in Hive table
>   //
>   sqltext = """
>   INSERT INTO TABLE michtest.BroadcastStaging PARTITION
> (broadcastid="123456", brand)
>   SELECT
>   ocis_party_id AS partyId
> , target_mobile_no AS phoneNumber
> , brand
> , broadcastid
>   FROM tmp
>   """
> //
>
> // Here I am performing a collection
>
> try  {
>
>  spark.sql(sqltext)
>
> } catch {
>
> case e: SQLException => e.printStackTrace
>
> sys.exit()
>
> }
>
>
>
> Now the issue I have is that what if the xml file  /tmp/broadcast.xml does
> not exist or deleted? I won't be able to catch the error until the hive
> table is populated. Of course I can write a shell script to check if the
> file exist before running the job or put small collection like
> df.show(1,0). Are there more general alternatives?
>
>
>
> Thanks
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn  
> *https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>


Re: Exception handling in Spark

2020-05-05 Thread Brandon Geise
You could use the Hadoop API and check if the file exists.

 

From: Mich Talebzadeh 
Date: Tuesday, May 5, 2020 at 11:25 AM
To: "user @spark" 
Subject: Exception handling in Spark

 

Hi,

 

As I understand exception handling in Spark only makes sense if one attempts an 
action as opposed to lazy transformations?

 

Let us assume that I am reading an XML file from the HDFS directory  and create 
a dataframe DF on it

 

val broadcastValue = "123456789"  // I assume this will be sent as a constant 
for the batch

// Create a DF on top of XML
val df = spark.read.
format("com.databricks.spark.xml").
option("rootTag", "hierarchy").
option("rowTag", "sms_request").
load("/tmp/broadcast.xml")

val newDF = df.withColumn("broadcastid", lit(broadcastValue))

newDF.createOrReplaceTempView("tmp")

  // Put data in Hive table
  //
  sqltext = """
  INSERT INTO TABLE michtest.BroadcastStaging PARTITION (broadcastid="123456", 
brand)
  SELECT
  ocis_party_id AS partyId
, target_mobile_no AS phoneNumber
, brand
, broadcastid
  FROM tmp
  """
//

// Here I am performing a collection 

try  {

 spark.sql(sqltext)

} catch {

case e: SQLException => e.printStackTrace

sys.exit()

}

 

Now the issue I have is that what if the xml file  /tmp/broadcast.xml does not 
exist or deleted? I won't be able to catch the error until the hive table is 
populated. Of course I can write a shell script to check if the file exist 
before running the job or put small collection like df.show(1,0). Are there 
more general alternatives?

 

Thanks

 

Dr Mich Talebzadeh

 

LinkedIn  
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

http://talebzadehmich.wordpress.com

 

Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction. 

 



Exception handling in Spark

2020-05-05 Thread Mich Talebzadeh
Hi,

As I understand exception handling in Spark only makes sense if one
attempts an action as opposed to lazy transformations?

Let us assume that I am reading an XML file from the HDFS directory  and
create a dataframe DF on it

val broadcastValue = "123456789"  // I assume this will be sent as a
constant for the batch
// Create a DF on top of XML
val df = spark.read.
format("com.databricks.spark.xml").
option("rootTag", "hierarchy").
option("rowTag", "sms_request").
load("/tmp/broadcast.xml")

val newDF = df.withColumn("broadcastid", lit(broadcastValue))

newDF.createOrReplaceTempView("tmp")

  // Put data in Hive table
  //
  sqltext = """
  INSERT INTO TABLE michtest.BroadcastStaging PARTITION
(broadcastid="123456", brand)
  SELECT
  ocis_party_id AS partyId
, target_mobile_no AS phoneNumber
, brand
, broadcastid
  FROM tmp
  """
//
// Here I am performing a collection
try  {
 spark.sql(sqltext)
} catch {
case e: SQLException => e.printStackTrace
sys.exit()
}

Now the issue I have is that what if the xml file  /tmp/broadcast.xml does
not exist or deleted? I won't be able to catch the error until the hive
table is populated. Of course I can write a shell script to check if the
file exist before running the job or put small collection like
df.show(1,0). Are there more general alternatives?

Thanks

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.


Re: Path style access fs.s3a.path.style.access property is not working in spark code

2020-05-05 Thread Samik Raychaudhuri
Recommend to use v2.9.x, there are lot of optimizations that makes life 
much easier while accessing from Spark.

Thanks.
-Samik

On 05-05-2020 01:55 am, Aniruddha P Tekade wrote:

Hello User,

I got the solution to this. If you are writing to a custom s3 url, 
then use the hadoop-aws-2.8.0.jar as the separate flag was introduced 
to enable path style access.


Best,
Aniruddha
---

ᐧ

On Fri, May 1, 2020 at 5:08 PM Aniruddha P Tekade 
mailto:ateka...@binghamton.edu>> wrote:


Hello Users,

I am using on-premise object storage and able to perform
operations on different bucket using aws-cli.
However, when I am trying to use the same path from my spark code,
it fails. Here are the details -

Addes dependencies in build.sbt -

  * |hadoop-aws-2.7.4.ja|
  * |aws-java-sdk-1.7.4.jar|

Spark Hadoop Configuration setup as -


|spark.sparkContext.hadoopConfiguration.set("fs.s3a.impl","org.apache.hadoop.fs.s3a.S3AFileSystem")spark.sparkContext.hadoopConfiguration.set("fs.s3a.endpoint",ENDPOINT);spark.sparkContext.hadoopConfiguration.set("fs.s3a.access.key",ACCESS_KEY);spark.sparkContext.hadoopConfiguration.set("fs.s3a.secret.key",SECRET_KEY);spark.sparkContext.hadoopConfiguration.set("fs.s3a.path.style.access","true")|

And now I try to write data into my custom s3 endpoint as follows -


|valdataStreamWriter:DataStreamWriter[Row]=PM25quality.select(dayofmonth(current_date())as
"day",month(current_date())as "month",year(current_date())as
"year",column("time"),column("quality"),column("PM25")).writeStream

.partitionBy("year","month","day").format("csv").outputMode("append").option("path","s3a://test-bucket/")valstreamingQuery:StreamingQuery=dataStreamWriter.start()|


However, I am getting en errorthat AmazonHttpClient is not able to
execute HTTP requestand
also it is referring to the bucket-name before the URL. Seems like
the hadoop configuration is not being resolved here -


20/05/01 16:51:37 INFO AmazonHttpClient: Unable to execute
HTTP request: test-bucket.s3-region0.cloudian.com

java.net.UnknownHostException:
test-bucket.s3-region0.cloudian.com



Is there anything that I am missing here in the configurations?
Seems like even after setting up path style access to true,
it's not working.

--
Aniruddha
---
ᐧ



--
Samik Raychaudhuri, Ph.D.
http://in.linkedin.com/in/samikr/


Alternative for spark-redshift on scala 2.12

2020-05-05 Thread Jun Zhu
Hello Users,

Is there any alternative for https://github.com/databricks/spark-redshift on
scala 2.12.x?

Thanks


-- 
[image: vshapesaqua11553186012.gif]    *Jun Zhu*
Sr. Engineer I, Data
+86 18565739171

[image: in1552694272.png] [image:
fb1552694203.png]   [image:
tw1552694330.png]   [image:
ig1552694392.png] 
Units 3801, 3804, 38F, C Block, Beijing Yintai Center, Beijing, China