ahmedabu98 commented on issue #32746:
URL: https://github.com/apache/beam/issues/32746#issuecomment-2407212040
Ahh I'm seeing the error now
```
java.lang.IllegalArgumentException: Expected all data writers to be closed,
but found 1 data writer(s) still open
at
org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.base.Preconditions.checkArgument(Preconditions.java:191)
at
org.apache.beam.sdk.io.iceberg.RecordWriterManager.close(RecordWriterManager.java:295)
at
org.apache.beam.sdk.io.iceberg.WriteGroupedRowsToFiles$WriteGroupedRowsToFilesDoFn.processElement(WriteGroupedRowsToFiles.java:109)
```
And also seeing the same number of these errors:
<details>
<summary>Full stacktrace</summary>
```
java.io.UncheckedIOException: Failed to create Parquet file
at
org.apache.iceberg.parquet.ParquetWriter.ensureWriterInitialized(ParquetWriter.java:121)
at
org.apache.iceberg.parquet.ParquetWriter.flushRowGroup(ParquetWriter.java:210)
at
org.apache.iceberg.parquet.ParquetWriter.close(ParquetWriter.java:254)
at org.apache.iceberg.io.DataWriter.close(DataWriter.java:82)
at
org.apache.beam.sdk.io.iceberg.RecordWriter.close(RecordWriter.java:112)
at
org.apache.beam.sdk.io.iceberg.RecordWriterManager$DestinationState.lambda$new$0(RecordWriterManager.java:114)
at
org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.cache.LocalCache.processPendingNotifications(LocalCache.java:1850)
at
org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.cache.LocalCache$Segment.runUnlockedCleanup(LocalCache.java:3503)
at
org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.cache.LocalCache$Segment.postWriteCleanup(LocalCache.java:3479)
at
org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.cache.LocalCache$Segment.remove(LocalCache.java:3108)
at
org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.cache.LocalCache.remove(LocalCache.java:4305)
at
org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.cache.LocalCache$LocalManualCache.invalidate(LocalCache.java:4950)
at
org.apache.beam.sdk.io.iceberg.RecordWriterManager$DestinationState.fetchWriterForPartition(RecordWriterManager.java:156)
at
org.apache.beam.sdk.io.iceberg.RecordWriterManager$DestinationState.write(RecordWriterManager.java:141)
at
org.apache.beam.sdk.io.iceberg.RecordWriterManager.write(RecordWriterManager.java:270)
at
org.apache.beam.sdk.io.iceberg.WriteGroupedRowsToFiles$WriteGroupedRowsToFilesDoFn.processElement(WriteGroupedRowsToFiles.java:107)
at
org.apache.beam.sdk.io.iceberg.WriteGroupedRowsToFiles$WriteGroupedRowsToFilesDoFn$DoFnInvoker.invokeProcessElement(Unknown
Source)
at
org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:212)
at
org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.SimpleDoFnRunner.processElement(SimpleDoFnRunner.java:186)
at
org.apache.beam.runners.dataflow.worker.SimpleParDoFn.processElement(SimpleParDoFn.java:340)
at
org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoOperation.process(ParDoOperation.java:44)
at
org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver.process(OutputReceiver.java:54)
at
org.apache.beam.runners.dataflow.worker.SimpleParDoFn$1.output(SimpleParDoFn.java:285)
at
org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.SimpleDoFnRunner.outputWindowedValue(SimpleDoFnRunner.java:276)
at
org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.SimpleDoFnRunner.access$900(SimpleDoFnRunner.java:86)
at
org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.SimpleDoFnRunner$OnTimerArgumentProvider.outputWindowedValue(SimpleDoFnRunner.java:890)
at
org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.SimpleDoFnRunner$OnTimerArgumentProvider.outputWithTimestamp(SimpleDoFnRunner.java:878)
at
org.apache.beam.sdk.transforms.DoFnOutputReceivers$WindowedContextOutputReceiver.outputWithTimestamp(DoFnOutputReceivers.java:98)
at
org.apache.beam.sdk.transforms.GroupIntoBatches$GroupIntoBatchesDoFn.flushBatch(GroupIntoBatches.java:660)
at
org.apache.beam.sdk.transforms.GroupIntoBatches$GroupIntoBatchesDoFn.onBufferingTimer(GroupIntoBatches.java:601)
at
org.apache.beam.sdk.transforms.GroupIntoBatches$GroupIntoBatchesDoFn$OnTimerInvoker$tsendOfBuffering$dHMtZW5kT2ZCdWZmZXJpbmc.invokeOnTimer(Unknown
Source)
at
org.apache.beam.sdk.transforms.reflect.ByteBuddyDoFnInvokerFactory$DoFnInvokerBase.invokeOnTimer(ByteBuddyDoFnInvokerFactory.java:242)
at
org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.SimpleDoFnRunner.onTimer(SimpleDoFnRunner.java:206)
at
org.apache.beam.runners.dataflow.worker.SimpleParDoFn.processUserTimer(SimpleParDoFn.java:366)
at
org.apache.beam.runners.dataflow.worker.SimpleParDoFn.access$600(SimpleParDoFn.java:79)
at
org.apache.beam.runners.dataflow.worker.SimpleParDoFn$TimerType$1.processTimer(SimpleParDoFn.java:454)
at
org.apache.beam.runners.dataflow.worker.SimpleParDoFn.processTimers(SimpleParDoFn.java:483)
at
org.apache.beam.runners.dataflow.worker.SimpleParDoFn.processTimers(SimpleParDoFn.java:358)
at
org.apache.beam.runners.dataflow.worker.util.common.worker.ParDoOperation.finish(ParDoOperation.java:52)
at
org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:94)
at
org.apache.beam.runners.dataflow.worker.streaming.ComputationWorkExecutor.executeWork(ComputationWorkExecutor.java:78)
at
org.apache.beam.runners.dataflow.worker.windmill.work.processing.StreamingWorkScheduler.executeWork(StreamingWorkScheduler.java:382)
at
org.apache.beam.runners.dataflow.worker.windmill.work.processing.StreamingWorkScheduler.processWork(StreamingWorkScheduler.java:255)
at
org.apache.beam.runners.dataflow.worker.windmill.work.processing.StreamingWorkScheduler.lambda$scheduleWork$2(StreamingWorkScheduler.java:214)
at
org.apache.beam.runners.dataflow.worker.streaming.ExecutableWork.run(ExecutableWork.java:38)
at
org.apache.beam.runners.dataflow.worker.util.BoundedQueueExecutor.lambda$executeMonitorHeld$0(BoundedQueueExecutor.java:234)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.io.IOException: Failed to get result: java.io.IOException:
Failed to get result: java.lang.OutOfMemoryError: GC overhead limit exceeded
at
com.google.cloud.hadoop.gcsio.GoogleCloudStorageFileSystem.getFromFuture(GoogleCloudStorageFileSystem.java:911)
at
com.google.cloud.hadoop.gcsio.GoogleCloudStorageFileSystem.create(GoogleCloudStorageFileSystem.java:293)
at
com.google.cloud.hadoop.fs.gcs.GoogleHadoopOutputStream.createChannel(GoogleHadoopOutputStream.java:80)
at
com.google.cloud.hadoop.fs.gcs.GoogleHadoopOutputStream.<init>(GoogleHadoopOutputStream.java:71)
at
com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.lambda$create$3(GoogleHadoopFileSystemBase.java:650)
at
com.google.cloud.hadoop.fs.gcs.GhfsStorageStatistics.trackDuration(GhfsStorageStatistics.java:77)
at
com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.create(GoogleHadoopFileSystemBase.java:624)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1073)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1054)
at
org.apache.parquet.hadoop.util.HadoopOutputFile.createOrOverwrite(HadoopOutputFile.java:81)
at
org.apache.parquet.hadoop.ParquetFileWriter.<init>(ParquetFileWriter.java:345)
at
org.apache.parquet.hadoop.ParquetFileWriter.<init>(ParquetFileWriter.java:323)
at
org.apache.iceberg.parquet.ParquetWriter.ensureWriterInitialized(ParquetWriter.java:111)
... 48 more
Caused by: java.util.concurrent.ExecutionException: java.io.IOException:
Failed to get result: java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
at
com.google.cloud.hadoop.gcsio.GoogleCloudStorageFileSystem.getFromFuture(GoogleCloudStorageFileSystem.java:905)
... 60 more
Caused by: java.io.IOException: Failed to get result:
java.lang.OutOfMemoryError: GC overhead limit exceeded
at
com.google.cloud.hadoop.gcsio.GoogleCloudStorageFileSystem.getFromFuture(GoogleCloudStorageFileSystem.java:911)
at
com.google.cloud.hadoop.gcsio.GoogleCloudStorageFileSystem.getFileInfoInternal(GoogleCloudStorageFileSystem.java:1115)
at
com.google.cloud.hadoop.gcsio.GoogleCloudStorageFileSystem.lambda$create$1(GoogleCloudStorageFileSystem.java:287)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
... 3 more
Caused by: java.util.concurrent.ExecutionException:
java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
at
com.google.cloud.hadoop.gcsio.GoogleCloudStorageFileSystem.getFromFuture(GoogleCloudStorageFileSystem.java:905)
... 6 more
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
```
</details>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]