[
https://issues.apache.org/jira/browse/HADOOP-17847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17405150#comment-17405150
]
Steve Loughran commented on HADOOP-17847:
-----------------------------------------
bq. I think flink 1.8 is using hadoop 2.4.1.
something newer than that surely.
bq. we're streaming avro from Kafka into parquet files on S3. I've verified
that we have a small proportion of avro messages on Kafka that have no
corresponding parquet rows on S3. This might imply it's more than just an
instrumentation thing.
ok, that's more serious.
# can you see this if you update the jars to 3.3.0/3.3.1?
# turn logging on in S3ABlockOutputStream to debug and see what it says.
looking at the original patch, I see this is happening *during shutdown*. This
may be a sign that the stream hasn't finished uploading before the FS is shut
down (and its http connections closed/pool destroy)
\
> S3AInstrumentation Closing output stream statistics while data is still
> marked as pending upload in OutputStreamStatistics
> --------------------------------------------------------------------------------------------------------------------------
>
> Key: HADOOP-17847
> URL: https://issues.apache.org/jira/browse/HADOOP-17847
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs/s3
> Affects Versions: 3.2.1
> Environment: hadoop: 3.2.1
> spark: 3.0.2
> k8s server version: 1.18
> aws.java.sdk.bundle.version:1.11.1033
> Reporter: Li Rong
> Priority: Minor
> Attachments: logs.txt
>
>
> When using hadoop s3a file upload for spark event Logs, the logs were queued
> up and not uploaded before the process is shut down:
> {code:java}
> // 21/08/13 12:22:39 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client
> has been closed (this is expected if the application is shutting down.)
> 21/08/13 12:22:39 WARN S3AInstrumentation: Closing output stream statistics
> while data is still marked as pending upload in
> OutputStreamStatistics{blocksSubmitted=1, blocksInQueue=1, blocksActive=0,
> blockUploadsCompleted=0, blockUploadsFailed=0, bytesPendingUpload=106716,
> bytesUploaded=0, blocksAllocated=1, blocksReleased=1,
> blocksActivelyAllocated=0, exceptionsInMultipartFinalize=0,
> transferDuration=0 ms, queueDuration=0 ms, averageQueueTime=0 ms,
> totalUploadDuration=0 ms, effectiveBandwidth=0.0 bytes/s}{code}
> details see logs attached
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]