[
https://issues.apache.org/jira/browse/HUDI-3724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sivabalan narayanan updated HUDI-3724:
--------------------------------------
Description:
We run integ tests against hudi and recently our spark long running tests are
failing for COW table with "too many open files". May be we have some leaks and
need to chase them and close it out.
{code:java}
... 6 more
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure:
Task 0 in stage 6808.0 failed 1 times, most recent failure: Lost task 0.0 in
stage 6808.0 (TID 109960) (ip-10-0-40-161.us-west-1.compute.internal executor
driver): java.io.FileNotFoundException:
/tmp/blockmgr-96dd9c25-86c7-4d00-a20a-d6515eef9a37/39/temp_shuffle_9149fce7-e9b0-4fee-bb21-1eba16dd89a3
(Too many open files)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
at
org.apache.spark.storage.DiskBlockObjectWriter.initialize(DiskBlockObjectWriter.scala:133)
at
org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:152)
at
org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:279)
at
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:171)
at
org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
{code}
was:
We run integ tests against hudi and recently our spark long running tests are
failing for COW table with "too many open files". May be we have some leaks and
need to chase them and close it out.
> Too many open files w/ COW spark long running tests
> ---------------------------------------------------
>
> Key: HUDI-3724
> URL: https://issues.apache.org/jira/browse/HUDI-3724
> Project: Apache Hudi
> Issue Type: Bug
> Reporter: sivabalan narayanan
> Assignee: sivabalan narayanan
> Priority: Major
> Fix For: 0.11.0
>
>
> We run integ tests against hudi and recently our spark long running tests are
> failing for COW table with "too many open files". May be we have some leaks
> and need to chase them and close it out.
> {code:java}
> ... 6 more
> Caused by: org.apache.spark.SparkException: Job aborted due to stage failure:
> Task 0 in stage 6808.0 failed 1 times, most recent failure: Lost task 0.0 in
> stage 6808.0 (TID 109960) (ip-10-0-40-161.us-west-1.compute.internal executor
> driver): java.io.FileNotFoundException:
> /tmp/blockmgr-96dd9c25-86c7-4d00-a20a-d6515eef9a37/39/temp_shuffle_9149fce7-e9b0-4fee-bb21-1eba16dd89a3
> (Too many open files)
> at java.io.FileOutputStream.open0(Native Method)
> at java.io.FileOutputStream.open(FileOutputStream.java:270)
> at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
> at
> org.apache.spark.storage.DiskBlockObjectWriter.initialize(DiskBlockObjectWriter.scala:133)
> at
> org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:152)
> at
> org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:279)
> at
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:171)
> at
> org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
> at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
> at org.apache.spark.scheduler.Task.run(Task.scala:131)
> at
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
> at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> {code}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)