robertnagy1 opened a new issue, #876:
URL: https://github.com/apache/sedona/issues/876
## Expected behavior
I have read in a shapefile as an RDD (approximately 4 million rows and about
5 columns.
I am trying to set the Buffer max to a higher number, but no matter how high
it is, I still get an error regarding Buffer.max when trying to call
rdd.countWithoutDuplicates() . This then makes me curious: Does
config("spark.kryoserializer.buffer.max","50g") have any effect=
spark = SparkSession.\
builder.\
master("local[*]").\
appName("Sedona App").\
config("spark.serializer", KryoSerializer.getName).\
config("spark.kryo.registrator", SedonaKryoRegistrator.getName).\
config("spark.kryoserializer.buffer","50g").\
config("spark.kryoserializer.buffer.max","50g").\
config('spark.executor.memory', "2g").\
config("spark.driver.memory", "3g").\
config("spark.jars.packages",
"org.apache.sedona:sedona-spark-shaded-3.0_2.12:1.4.0,org.datasyslab:geotools-wrapper:1.4.0-28.2").\
getOrCreate()
## Actual behavior
No matter how high the spark.kryoserializer.buffer.max is the error is the
same:
Py4JJavaError: An error occurred while calling o4019.countWithoutDuplicates.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
in stage 27.0 failed 4 times, most recent failure: Lost task 0.3 in stage 27.0
(TID 35) (vm-78d56181 executor 2): org.apache.spark.SparkException: Kryo
serialization failed: Buffer overflow. Available: 6, required: 8
## Steps to reproduce the problem
Download a large shapefile, create a spark session as described above and
try to run countWithoutDuplicates() on it.
## Settings
Sedona version = 1.4.1
Apache Spark version = 3.3.1.5.2-92314920
API type = Python
Scala version = 2.12
Python version = 3.10
Environment = Azure Synapse spark pool
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]