[ 
https://issues.apache.org/jira/browse/SPARK-24273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483908#comment-16483908
 ] 

Jami Malikzade commented on SPARK-24273:
----------------------------------------

[~kiszk]

I went deeper and found more:

This way it works and creates empty rdd, as after filter 0 rows returned

val df = 
spark.read.option("header","true").option("sep",",").schema(testschema).csv("s3a://phub-1526909295-81/salary.csv").filter('salary
 > 300).withColumn("month", when('name === "Smith", "6").otherwise("3"))
df.checkpoint()

df.show()

 

Thiw way it fails on df.show() and non-empty file is created(though after 
filter 0 rows returned)

val df = 
spark.read.option("header","true").option("sep",",").schema(testschema).csv("s3a://phub-1526909295-81/salary.csv").filter('salary
 > 300).withColumn("month", when('name === "Smith", 
"6").otherwise("3")).checkpoint()

df.show()

> Failure while using .checkpoint method
> --------------------------------------
>
>                 Key: SPARK-24273
>                 URL: https://issues.apache.org/jira/browse/SPARK-24273
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Shell
>    Affects Versions: 2.3.0
>            Reporter: Jami Malikzade
>            Priority: Major
>
> We are getting following error:
> com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 416, AWS 
> Service: Amazon S3, AWS Request ID: 
> tx000000000000000014126-005ae9bfd9-9ed9ac2-default, AWS Error Code: 
> InvalidRange, AWS Error Message: null, S3 Extended Request ID: 
> 9ed9ac2-default-default"
> when we use checkpoint method as below.
> val streamBucketDF = streamPacketDeltaDF
>  .filter('timeDelta > maxGap && 'timeDelta <= 30000)
>  .withColumn("bucket", when('timeDelta <= mediumGap, "medium")
>  .otherwise("large")
>  )
>  .checkpoint()
> Do you have idea how to prevent invalid range in header to be sent, or how it 
> can be workarounded or fixed?
> Thanks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to