[
https://issues.apache.org/jira/browse/HADOOP-17954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sudarshan updated HADOOP-17954:
-------------------------------
Description:
I am trying to run spark job (1.6.0) which reads rows from HBASE and does some
transformation and finally writes to S3 .
Some time i can notice error because of time out .
Task is able to write to S3 but at last stage it fails
Here is the error details
{code:java}
Job aborted due to stage failure: Task 1074 in stage 1.0 failed 4 times, most
recent failure: Lost task 1074.3 in stage 1.0 (TID 2162,
abcd.ecom.bigdata.int.abcd.com, executor 18): org.apache.spark.SparkException:
Task failed while writing rowsJob aborted due to stage failure: Task 1074 in
stage 1.0 failed 4 times, most recent failure: Lost task 1074.3 in stage 1.0
(TID 2162, abcd.ecom.bigdata.int.abcd.com, executor 18):
org.apache.spark.SparkException: Task failed while writing rows at
org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer.writeRows(WriterContainer.scala:417)
at
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:148)
at
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:148)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at
org.apache.spark.scheduler.Task.run(Task.scala:89) at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)Caused by:
org.apache.hadoop.fs.s3a.AWSS3IOException: saving output on
common/hbaseHistory/metadataSept100621/_temporary/_attempt_202110060911_0001_m_001074_3/year=2021/month=09/submitDate=2021-09-08T04%3a58%3a47Z/part-r-01074-205c8b21-7840-4985-bb0e-65ed787c337d.snappy.parquet:
com.cloudera.com.amazonaws.services.s3.model.AmazonS3Exception: Your socket
connection to the server was not read from or written to within the timeout
period. Idle connections will be closed. (Service: Amazon S3; Status Code: 400;
Error Code: RequestTimeout; Request ID: 5J85XRNF10W16ZJS), S3 Extended Request
ID:
4g08KHEDbFs5jueJqt9Snw7Xlmw5VeS1eCtJyAzp0fzHGinMhBntwIEhddJP7LLaS0teR3EAuOI=:
Your socket connection to the server was not read from or written to within the
timeout period. Idle connections will be closed. (Service: Amazon S3; Status
Code: 400; Error Code: RequestTimeout; Request ID: 5J85XRNF10W16ZJS) at
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:143) at
org.apache.hadoop.fs.s3a.S3AOutputStream.close(S3AOutputStream.java:123) at
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
at parquet.hadoop.ParquetFileWriter.end(ParquetFileWriter.java:470) at
parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:112)
at parquet.hadoop.ParquetRecordWriter.close(ParquetRecordWriter.java:112) at
org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.close(ParquetRelation.scala:101)
at
org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer$$anonfun$writeRows$4.apply$mcV$sp(WriterContainer.scala:387)
at
org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer$$anonfun$writeRows$4.apply(WriterContainer.scala:343)
at
org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer$$anonfun$writeRows$4.apply(WriterContainer.scala:343)
at
org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1278)
at
org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer.writeRows(WriterContainer.scala:409)
... 8 more Suppressed: java.lang.NullPointerException at
parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:152)
at
parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:111)
at parquet.hadoop.ParquetRecordWriter.close(ParquetRecordWriter.java:112) at
org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.close(ParquetRelation.scala:101)
at
org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer$$anonfun$writeRows$5.apply$mcV$sp(WriterContainer.scala:411)
at
org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1287)
... 9 moreCaused by:
com.cloudera.com.amazonaws.services.s3.model.AmazonS3Exception: Your socket
connection to the server was not read from or written to within the timeout
period. Idle connections will be closed. (Service: Amazon S3; Status Code: 400;
Error Code: RequestTimeout; Request ID: 5J85XRNF10W16ZJS), S3 Extended Request
ID:
4g08KHEDbFs5jueJqt9Snw7Xlmw5VeS1eCtJyAzp0fzHGinMhBntwIEhddJP7LLaS0teR3EAuOI= at
com.cloudera.com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:1182)
at
com.cloudera.com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:770)
at
com.cloudera.com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:489)
at
com.cloudera.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:310)
at
com.cloudera.com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3785)
at
com.cloudera.com.amazonaws.services.s3.AmazonS3Client.putObject(AmazonS3Client.java:1472)
at
com.cloudera.com.amazonaws.services.s3.transfer.internal.UploadCallable.uploadInOneChunk(UploadCallable.java:131)
at
com.cloudera.com.amazonaws.services.s3.transfer.internal.UploadCallable.call(UploadCallable.java:123)
at
com.cloudera.com.amazonaws.services.s3.transfer.internal.UploadMonitor.call(UploadMonitor.java:139)
at
com.cloudera.com.amazonaws.services.s3.transfer.internal.UploadMonitor.call(UploadMonitor.java:47)
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ... 3 more
Driver stacktrace:
{code}
was:
I am trying to run spark job (1.6.0) which reads rows from HBASE and does some
transformation and finally writes to S3 .
Some time i can notice error because of time out .
Task is able to write to S3 but at last stage it fails
Here is the error details
Job aborted due to stage failure: Task 1074 in stage 1.0 failed 4 times, most
recent failure: Lost task 1074.3 in stage 1.0 (TID 2162,
abcd.ecom.bigdata.int.abcd.com, executor 18): org.apache.spark.SparkException:
Task failed while writing rowsJob aborted due to stage failure: Task 1074 in
stage 1.0 failed 4 times, most recent failure: Lost task 1074.3 in stage 1.0
(TID 2162, abcd.ecom.bigdata.int.abcd.com, executor 18):
org.apache.spark.SparkException: Task failed while writing rows at
org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer.writeRows(WriterContainer.scala:417)
at
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:148)
at
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:148)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at
org.apache.spark.scheduler.Task.run(Task.scala:89) at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242) at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)Caused by:
org.apache.hadoop.fs.s3a.AWSS3IOException: saving output on
common/hbaseHistory/metadataSept100621/_temporary/_attempt_202110060911_0001_m_001074_3/year=2021/month=09/submitDate=2021-09-08T04%3a58%3a47Z/part-r-01074-205c8b21-7840-4985-bb0e-65ed787c337d.snappy.parquet:
com.cloudera.com.amazonaws.services.s3.model.AmazonS3Exception: Your socket
connection to the server was not read from or written to within the timeout
period. Idle connections will be closed. (Service: Amazon S3; Status Code: 400;
Error Code: RequestTimeout; Request ID: 5J85XRNF10W16ZJS), S3 Extended Request
ID:
4g08KHEDbFs5jueJqt9Snw7Xlmw5VeS1eCtJyAzp0fzHGinMhBntwIEhddJP7LLaS0teR3EAuOI=:
Your socket connection to the server was not read from or written to within the
timeout period. Idle connections will be closed. (Service: Amazon S3; Status
Code: 400; Error Code: RequestTimeout; Request ID: 5J85XRNF10W16ZJS) at
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:143) at
org.apache.hadoop.fs.s3a.S3AOutputStream.close(S3AOutputStream.java:123) at
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
at parquet.hadoop.ParquetFileWriter.end(ParquetFileWriter.java:470) at
parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:112)
at parquet.hadoop.ParquetRecordWriter.close(ParquetRecordWriter.java:112) at
org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.close(ParquetRelation.scala:101)
at
org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer$$anonfun$writeRows$4.apply$mcV$sp(WriterContainer.scala:387)
at
org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer$$anonfun$writeRows$4.apply(WriterContainer.scala:343)
at
org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer$$anonfun$writeRows$4.apply(WriterContainer.scala:343)
at
org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1278)
at
org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer.writeRows(WriterContainer.scala:409)
... 8 more Suppressed: java.lang.NullPointerException at
parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:152)
at
parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:111)
at parquet.hadoop.ParquetRecordWriter.close(ParquetRecordWriter.java:112) at
org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.close(ParquetRelation.scala:101)
at
org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer$$anonfun$writeRows$5.apply$mcV$sp(WriterContainer.scala:411)
at
org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1287)
... 9 moreCaused by:
com.cloudera.com.amazonaws.services.s3.model.AmazonS3Exception: Your socket
connection to the server was not read from or written to within the timeout
period. Idle connections will be closed. (Service: Amazon S3; Status Code: 400;
Error Code: RequestTimeout; Request ID: 5J85XRNF10W16ZJS), S3 Extended Request
ID:
4g08KHEDbFs5jueJqt9Snw7Xlmw5VeS1eCtJyAzp0fzHGinMhBntwIEhddJP7LLaS0teR3EAuOI= at
com.cloudera.com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:1182)
at
com.cloudera.com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:770)
at
com.cloudera.com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:489)
at
com.cloudera.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:310)
at
com.cloudera.com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3785)
at
com.cloudera.com.amazonaws.services.s3.AmazonS3Client.putObject(AmazonS3Client.java:1472)
at
com.cloudera.com.amazonaws.services.s3.transfer.internal.UploadCallable.uploadInOneChunk(UploadCallable.java:131)
at
com.cloudera.com.amazonaws.services.s3.transfer.internal.UploadCallable.call(UploadCallable.java:123)
at
com.cloudera.com.amazonaws.services.s3.transfer.internal.UploadMonitor.call(UploadMonitor.java:139)
at
com.cloudera.com.amazonaws.services.s3.transfer.internal.UploadMonitor.call(UploadMonitor.java:47)
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ... 3 more
Driver stacktrace:
> org.apache.spark.SparkException: Task failed while writing rows S3
> ------------------------------------------------------------------
>
> Key: HADOOP-17954
> URL: https://issues.apache.org/jira/browse/HADOOP-17954
> Project: Hadoop Common
> Issue Type: Bug
> Components: hadoop-thirdparty
> Affects Versions: 2.6.0
> Reporter: sudarshan
> Priority: Major
>
> I am trying to run spark job (1.6.0) which reads rows from HBASE and does
> some transformation and finally writes to S3 .
> Some time i can notice error because of time out .
> Task is able to write to S3 but at last stage it fails
> Here is the error details
>
> {code:java}
> Job aborted due to stage failure: Task 1074 in stage 1.0 failed 4 times, most
> recent failure: Lost task 1074.3 in stage 1.0 (TID 2162,
> abcd.ecom.bigdata.int.abcd.com, executor 18):
> org.apache.spark.SparkException: Task failed while writing rowsJob aborted
> due to stage failure: Task 1074 in stage 1.0 failed 4 times, most recent
> failure: Lost task 1074.3 in stage 1.0 (TID 2162,
> abcd.ecom.bigdata.int.abcd.com, executor 18):
> org.apache.spark.SparkException: Task failed while writing rows at
> org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer.writeRows(WriterContainer.scala:417)
> at
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:148)
> at
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:148)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at
> org.apache.spark.scheduler.Task.run(Task.scala:89) at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242) at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)Caused by:
> org.apache.hadoop.fs.s3a.AWSS3IOException: saving output on
> common/hbaseHistory/metadataSept100621/_temporary/_attempt_202110060911_0001_m_001074_3/year=2021/month=09/submitDate=2021-09-08T04%3a58%3a47Z/part-r-01074-205c8b21-7840-4985-bb0e-65ed787c337d.snappy.parquet:
> com.cloudera.com.amazonaws.services.s3.model.AmazonS3Exception: Your socket
> connection to the server was not read from or written to within the timeout
> period. Idle connections will be closed. (Service: Amazon S3; Status Code:
> 400; Error Code: RequestTimeout; Request ID: 5J85XRNF10W16ZJS), S3 Extended
> Request ID:
> 4g08KHEDbFs5jueJqt9Snw7Xlmw5VeS1eCtJyAzp0fzHGinMhBntwIEhddJP7LLaS0teR3EAuOI=:
> Your socket connection to the server was not read from or written to within
> the timeout period. Idle connections will be closed. (Service: Amazon S3;
> Status Code: 400; Error Code: RequestTimeout; Request ID: 5J85XRNF10W16ZJS)
> at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:143) at
> org.apache.hadoop.fs.s3a.S3AOutputStream.close(S3AOutputStream.java:123) at
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
> at
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) at
> parquet.hadoop.ParquetFileWriter.end(ParquetFileWriter.java:470) at
> parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:112)
> at parquet.hadoop.ParquetRecordWriter.close(ParquetRecordWriter.java:112) at
> org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.close(ParquetRelation.scala:101)
> at
> org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer$$anonfun$writeRows$4.apply$mcV$sp(WriterContainer.scala:387)
> at
> org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer$$anonfun$writeRows$4.apply(WriterContainer.scala:343)
> at
> org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer$$anonfun$writeRows$4.apply(WriterContainer.scala:343)
> at
> org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1278)
> at
> org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer.writeRows(WriterContainer.scala:409)
> ... 8 more Suppressed: java.lang.NullPointerException at
> parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:152)
> at
> parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:111)
> at parquet.hadoop.ParquetRecordWriter.close(ParquetRecordWriter.java:112) at
> org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.close(ParquetRelation.scala:101)
> at
> org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer$$anonfun$writeRows$5.apply$mcV$sp(WriterContainer.scala:411)
> at
> org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1287)
> ... 9 moreCaused by:
> com.cloudera.com.amazonaws.services.s3.model.AmazonS3Exception: Your socket
> connection to the server was not read from or written to within the timeout
> period. Idle connections will be closed. (Service: Amazon S3; Status Code:
> 400; Error Code: RequestTimeout; Request ID: 5J85XRNF10W16ZJS), S3 Extended
> Request ID:
> 4g08KHEDbFs5jueJqt9Snw7Xlmw5VeS1eCtJyAzp0fzHGinMhBntwIEhddJP7LLaS0teR3EAuOI=
> at
> com.cloudera.com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:1182)
> at
> com.cloudera.com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:770)
> at
> com.cloudera.com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:489)
> at
> com.cloudera.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:310)
> at
> com.cloudera.com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3785)
> at
> com.cloudera.com.amazonaws.services.s3.AmazonS3Client.putObject(AmazonS3Client.java:1472)
> at
> com.cloudera.com.amazonaws.services.s3.transfer.internal.UploadCallable.uploadInOneChunk(UploadCallable.java:131)
> at
> com.cloudera.com.amazonaws.services.s3.transfer.internal.UploadCallable.call(UploadCallable.java:123)
> at
> com.cloudera.com.amazonaws.services.s3.transfer.internal.UploadMonitor.call(UploadMonitor.java:139)
> at
> com.cloudera.com.amazonaws.services.s3.transfer.internal.UploadMonitor.call(UploadMonitor.java:47)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) ... 3 more
> Driver stacktrace:
>
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]