[ https://issues.apache.org/jira/browse/SPARK-24476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
bharath kumar avusherla updated SPARK-24476: -------------------------------------------- Description: We are working on spark streaming application using spark structured streaming with checkpointing in s3. When we start the application, the application runs just fine for sometime then it crashes with the error mentioned below. The amount of time it will run successfully varies from time to time, sometimes it will run for 2 days without any issues then crashes, sometimes it will crash after 4hrs/ 24hrs. Our streaming application joins(left and inner) multiple sources from kafka and also s3 and aurora database. Can you please let us know how to solve this problem? Is it possible to increase the timeout period? Here, I'm pasting the few line of complete exception log below. Also attached the complete exception to the issue. *_Exception:_* *_Caused by: java.net.SocketTimeoutException: Read timed out_* _at java.net.SocketInputStream.socketRead0(Native Method)_ _at java.net.SocketInputStream.read(SocketInputStream.java:150)_ _at java.net.SocketInputStream.read(SocketInputStream.java:121)_ _at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)_ _at sun.security.ssl.InputRecord.read(InputRecord.java:503)_ _at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:954)_ _at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1343)_ _at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1371)_ _at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1355)_ _at org.apache.http.conn.ssl.SSLSocketFactory.connectSocket(SSLSocketFactory.java:553)_ _at org.apache.http.conn.ssl.SSLSocketFactory.connectSocket(SSLSocketFactory.java:412)_ was: We are working on spark streaming application using spark structured streaming with checkpointing in s3. When we start the application, the application runs just fine for sometime then it crashes with the error mentioned below. The amount of time it will run successfully varies from time to time, sometimes it will run for 2 days without any issues then crashes, sometimes it will crash after 4hrs/ 24hrs. Our streaming application joins(left and inner) multiple sources from kafka and also s3 and aurora database. Can you please let us know how to solve this problem? Is it possible to increase the timeout period? Here, I'm pasting the complete exception log below. *_Exception:_* *_Caused by: java.net.SocketTimeoutException: Read timed out_* _at java.net.SocketInputStream.socketRead0(Native Method)_ _at java.net.SocketInputStream.read(SocketInputStream.java:150)_ _at java.net.SocketInputStream.read(SocketInputStream.java:121)_ _at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)_ _at sun.security.ssl.InputRecord.read(InputRecord.java:503)_ _at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:954)_ _at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1343)_ _at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1371)_ _at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1355)_ _at org.apache.http.conn.ssl.SSLSocketFactory.connectSocket(SSLSocketFactory.java:553)_ _at org.apache.http.conn.ssl.SSLSocketFactory.connectSocket(SSLSocketFactory.java:412)_ _at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:179)_ _at org.apache.http.impl.conn.AbstractPoolEntry.open(AbstractPoolEntry.java:144)_ _at org.apache.http.impl.conn.AbstractPooledConnAdapter.open(AbstractPooledConnAdapter.java:134)_ _at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:612)_ _at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:447)_ _at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:884)_ _at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)_ _at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)_ _at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRequest(RestStorageService.java:334)_ _at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRequest(RestStorageService.java:281)_ _at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRestHead(RestStorageService.java:942)_ _at org.jets3t.service.impl.rest.httpclient.RestStorageService.getObjectImpl(RestStorageService.java:2148)_ _at org.jets3t.service.impl.rest.httpclient.RestStorageService.getObjectDetailsImpl(RestStorageService.java:2075)_ _at org.jets3t.service.StorageService.getObjectDetails(StorageService.java:1093)_ _at org.jets3t.service.StorageService.getObjectDetails(StorageService.java:548)_ _at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieveMetadata(Jets3tNativeFileSystemStore.java:174)_ _at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)_ _at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)_ _at java.lang.reflect.Method.invoke(Method.java:483)_ _at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:409)_ _at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:163)_ _at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:155)_ _at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)_ _at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:346)_ _at org.apache.hadoop.fs.s3native.$Proxy18.retrieveMetadata(Unknown Source)_ _at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:493)_ _at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1437)_ _at org.apache.spark.sql.execution.streaming.HDFSMetadataLog$FileSystemManager.exists(HDFSMetadataLog.scala:446)_ _at org.apache.spark.sql.execution.streaming.HDFSMetadataLog.get(HDFSMetadataLog.scala:195)_ _at org.apache.spark.sql.execution.streaming.HDFSMetadataLog.add(HDFSMetadataLog.scala:110)_ _at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1.apply$mcV$sp(MicroBatchExecution.scala:339)_ _at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1.apply(MicroBatchExecution.scala:338)_ _at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1.apply(MicroBatchExecution.scala:338)_ _at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:271)_ _at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)_ _at org.apache.spark.sql.execution.streaming.MicroBatchExecution.org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch(MicroBatchExecution.scala:338)_ _at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(MicroBatchExecution.scala:128)_ _at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:121)_ _at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:121)_ _at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:271)_ _at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)_ _at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1.apply$mcZ$sp(MicroBatchExecution.scala:121)_ _at org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56)_ _at org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:117)_ _at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:279)_ > java.net.SocketTimeoutException: Read timed out Exception while running the > Spark Structured Streaming in 2.3.0 > --------------------------------------------------------------------------------------------------------------- > > Key: SPARK-24476 > URL: https://issues.apache.org/jira/browse/SPARK-24476 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 2.3.0 > Reporter: bharath kumar avusherla > Priority: Major > Attachments: socket-timeout-exception > > > We are working on spark streaming application using spark structured > streaming with checkpointing in s3. When we start the application, the > application runs just fine for sometime then it crashes with the error > mentioned below. The amount of time it will run successfully varies from time > to time, sometimes it will run for 2 days without any issues then crashes, > sometimes it will crash after 4hrs/ 24hrs. > Our streaming application joins(left and inner) multiple sources from kafka > and also s3 and aurora database. > Can you please let us know how to solve this problem? Is it possible to > increase the timeout period? > Here, I'm pasting the few line of complete exception log below. Also attached > the complete exception to the issue. > *_Exception:_* > *_Caused by: java.net.SocketTimeoutException: Read timed out_* > _at java.net.SocketInputStream.socketRead0(Native Method)_ > _at java.net.SocketInputStream.read(SocketInputStream.java:150)_ > _at java.net.SocketInputStream.read(SocketInputStream.java:121)_ > _at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)_ > _at sun.security.ssl.InputRecord.read(InputRecord.java:503)_ > _at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:954)_ > _at > sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1343)_ > _at > sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1371)_ > _at > sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1355)_ > _at > org.apache.http.conn.ssl.SSLSocketFactory.connectSocket(SSLSocketFactory.java:553)_ > _at > org.apache.http.conn.ssl.SSLSocketFactory.connectSocket(SSLSocketFactory.java:412)_ > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org