krisdas opened a new issue #4168:
URL: https://github.com/apache/iceberg/issues/4168
Flink version : 1.13.2
Iceberg version : 0.12
We were trying S3Fileio instead of default hadoop fileio, to ingest data
into iceberg on S3.
While commit/saving checkpoint, we see below call stack while uploading data
file. So looks datafile (for example :
s3/bucket/.../iceberg_db.db/iceberg_table/data/file.parquet) upload fails.
`
exception: { [-]
exception_class: java.util.concurrent.CompletionException
exception_message: java.io.UncheckedIOException:
java.nio.file.NoSuchFileException: /tmp/s3fileio-224113294985193160.tmp
stacktrace: java.util.concurrent.CompletionException:
java.io.UncheckedIOException: java.nio.file.NoSuchFileException:
/tmp/s3fileio-224113294985193160.tmp
at
java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:314)
at
java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:319)
at
java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1702)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.io.UncheckedIOException: java.nio.file.NoSuchFileException:
/tmp/s3fileio-224113294985193160.tmp
at
software.amazon.awssdk.utils.FunctionalUtils.asRuntimeException(FunctionalUtils.java:180)
at
software.amazon.awssdk.utils.FunctionalUtils.lambda$safeSupplier$4(FunctionalUtils.java:110)
at
software.amazon.awssdk.utils.FunctionalUtils.invokeSafely(FunctionalUtils.java:136)
at
software.amazon.awssdk.core.sync.RequestBody.fromFile(RequestBody.java:88)
at
software.amazon.awssdk.core.sync.RequestBody.fromFile(RequestBody.java:99)
at
org.apache.iceberg.aws.s3.S3OutputStream.lambda$uploadParts$1(S3OutputStream.java:237)
at
java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
... 3 more
Caused by: java.nio.file.NoSuchFileException:
/tmp/s3fileio-224113294985193160.tmp
at
java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92)
at
java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
at
java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116)
at
java.base/sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55)
at
java.base/sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:149)
at
java.base/sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99)
at java.base/java.nio.file.Files.readAttributes(Files.java:1764)
at java.base/java.nio.file.Files.size(Files.java:2381)
at
software.amazon.awssdk.core.sync.RequestBody.lambda$fromFile$0(RequestBody.java:88)
at
software.amazon.awssdk.utils.FunctionalUtils.lambda$safeSupplier$4(FunctionalUtils.java:108)
... 8 more
`
But on the iceberg manifest file, that data file
(s3/bucket/.../iceberg_db.db/iceberg_table/data/file.parquet) is still
referenced, as result, iceberg thinks that data file is valid and exists.
Now running another query, which needs to use the above data file, throws
exception.
Has anybody seen this scenario? We are trying to re-produce the scenario for
further investigation.
Code link :
https://github.com/apache/iceberg/blob/master/aws/src/main/java/org/apache/iceberg/aws/s3/S3OutputStream.java#L291
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]