Re: Query timed out after PT2M

2022-02-08 Thread Joe Obernberger
Update - the answer was spark.cassandra.input.split.sizeInMB. The default value is 512MBytes.  Setting this to 50 resulted in a lot more splits and the job ran in under 11 minutes; no timeout errors.  In this case the job was a simple count.  10 minutes 48 seconds for over 8.2 billion rows. 

Re: Query timed out after PT2M

2022-02-08 Thread Joe Obernberger
Update - I believe that for large tables, the spark.cassandra.read.timeoutMS needs to be very long; like 4 hours or longer.  The job now runs much longer, but still doesn't complete.  I'm now facing this all too familiar error: com.datastax.oss.driver.api.core.servererrors.ReadTimeoutException:

Re: Query timed out after PT2M

2022-02-07 Thread Joe Obernberger
Some more info.  Tried different GC strategies - no luck. It only happens on large tables (more than 1 billion rows).  Works fine on a 300million row table.  There is very high CPU usage during the run. I've tried setting spark.dse.continuousPagingEnabled to false and I've tried setting

Re: Query timed out after PT2M

2022-02-04 Thread Joe Obernberger
I've tried several different GC settings - but still getting timeouts. Using openJDK 11 with: -XX:+UseG1GC -XX:+ParallelRefProcEnabled -XX:G1RSetUpdatingPauseTimePercent=5 -XX:MaxGCPauseMillis=500 -XX:InitiatingHeapOccupancyPercent=70 -XX:ParallelGCThreads=24 -XX:ConcGCThreads=24 Machine has 40

Re: Query timed out after PT2M

2022-02-04 Thread Joe Obernberger
Still no go.  Oddly, I can use trino and do a count OK, but with spark I get the timeouts.  I don't believe tombstones are an issue: nodetool cfstats doc.doc Total number of tables: 82 Keyspace : doc     Read Count: 1514288521     Read Latency: 0.5080819034089475 ms

Re: Query timed out after PT2M

2022-02-03 Thread manish khandelwal
It maybe the case you have lots of tombstones in this table which is making reads slow and timeouts during bulk reads. On Fri, Feb 4, 2022, 03:23 Joe Obernberger wrote: > So it turns out that number after PT is increments of 60 seconds. I > changed the timeout to 96, and now I get PT16M

Re: Query timed out after PT2M

2022-02-03 Thread Joe Obernberger
So it turns out that number after PT is increments of 60 seconds.  I changed the timeout to 96, and now I get PT16M (96/6).  Since I'm still getting timeouts, something else must be wrong. Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage

Re: Query timed out after PT2M

2022-02-03 Thread Joe Obernberger
I did find this: https://github.com/datastax/spark-cassandra-connector/blob/master/doc/reference.md And "spark.cassandra.read.timeoutMS" is set to 12. Running a test now, and I think that is it.  Thank you Scott. -Joe On 2/3/2022 3:19 PM, Joe Obernberger wrote: Thank you Scott! I am

Re: Query timed out after PT2M

2022-02-03 Thread Joe Obernberger
Thank you Scott! I am using the spark cassandra connector.  Code: SparkSession spark = SparkSession     .builder()     .appName("SparkCassandraApp")     .config("spark.cassandra.connection.host", "chaos")    

Re: Query timed out after PT2M

2022-02-03 Thread C. Scott Andreas
Hi Joe, it looks like "PT2M" may refer to a timeout value that could be set by your Spark job's initialization of the client. I don't see a string matching this in the Cassandra codebase itself, but I do see that this is parseable as a Duration.```jshell>