[
https://issues.apache.org/jira/browse/SPARK-24273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482533#comment-16482533
]
Jami Malikzade commented on SPARK-24273:
----------------------------------------
[~kiszk]
I was able to reproduce in small piece of code by filtering column , so 0
records returned as below, 'salary > 300' no record match, so it should
checkpoint 0 records.
so steps are:
prepare in bucket salary.csv file like:
|name,year,salary,month|
|John,2017,12.33,4|
|John,2018,55.114,5|
|Smith,2017,36.339,3|
|Smith,2018,45.36,6|
And execute following spark :
sc.setCheckpointDir("s3a://testbucket/tmp")
val testschema = StructType(Array(
StructField("name", StringType, true),
StructField("year", IntegerType, true),
StructField("salary",FloatType,true),
StructField("month", IntegerType, true)
))
val df =
spark.read.option("header","true").option("sep",",").schema(testschema).csv("s3a://testbucket/salary.csv").filter('salary
> 300).withColumn("month", when('name === "Smith",
"6").otherwise("3")).checkpoint()
df.show()
It will throw :
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in
stage 11.0 failed 16 times, most recent failure: Lost task 0.15 in stage 11.0
(TID 41, 172.16.0.15, executor 5):
com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 416, AWS
Service: Amazon S3, AWS Request ID:
tx00000000000000018c023-005b02d1e7-513c871-default, AWS Error Code:
InvalidRange, AWS Error Message: null, S3 Extended Request ID:
513c871-default-default
> Failure while using .checkpoint method
> --------------------------------------
>
> Key: SPARK-24273
> URL: https://issues.apache.org/jira/browse/SPARK-24273
> Project: Spark
> Issue Type: Bug
> Components: Spark Shell
> Affects Versions: 2.3.0
> Reporter: Jami Malikzade
> Priority: Major
>
> We are getting following error:
> com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 416, AWS
> Service: Amazon S3, AWS Request ID:
> tx000000000000000014126-005ae9bfd9-9ed9ac2-default, AWS Error Code:
> InvalidRange, AWS Error Message: null, S3 Extended Request ID:
> 9ed9ac2-default-default"
> when we use checkpoint method as below.
> val streamBucketDF = streamPacketDeltaDF
> .filter('timeDelta > maxGap && 'timeDelta <= 30000)
> .withColumn("bucket", when('timeDelta <= mediumGap, "medium")
> .otherwise("large")
> )
> .checkpoint()
> Do you have idea how to prevent invalid range in header to be sent, or how it
> can be workarounded or fixed?
> Thanks.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]