[ 
https://issues.apache.org/jira/browse/HADOOP-17847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17405150#comment-17405150
 ] 

Steve Loughran commented on HADOOP-17847:
-----------------------------------------

bq. I think flink 1.8 is using hadoop 2.4.1. 

something newer than that surely. 

bq. we're streaming avro from Kafka into parquet files on S3. I've verified 
that we have a small proportion of avro messages on Kafka that have no 
corresponding parquet rows on S3.  This might imply it's more than just an 
instrumentation thing.


ok, that's more serious.

# can you see this if you update the jars to 3.3.0/3.3.1? 
# turn logging on in S3ABlockOutputStream to debug and see what it says.

looking at the original patch, I see this is happening *during shutdown*. This 
may be a sign that the stream hasn't finished uploading before the FS is shut 
down (and its http connections closed/pool destroy)

\


> S3AInstrumentation Closing output stream statistics while data is still 
> marked as pending upload in OutputStreamStatistics
> --------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-17847
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17847
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/s3
>    Affects Versions: 3.2.1
>         Environment: hadoop: 3.2.1
> spark: 3.0.2
> k8s server version: 1.18
> aws.java.sdk.bundle.version:1.11.1033
>            Reporter: Li Rong
>            Priority: Minor
>         Attachments: logs.txt
>
>
> When using hadoop s3a file upload for spark event Logs, the logs were queued 
> up and not uploaded before the process is shut down:
> {code:java}
> // 21/08/13 12:22:39 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client 
> has been closed (this is expected if the application is shutting down.)
> 21/08/13 12:22:39 WARN S3AInstrumentation: Closing output stream statistics 
> while data is still marked as pending upload in 
> OutputStreamStatistics{blocksSubmitted=1, blocksInQueue=1, blocksActive=0, 
> blockUploadsCompleted=0, blockUploadsFailed=0, bytesPendingUpload=106716, 
> bytesUploaded=0, blocksAllocated=1, blocksReleased=1, 
> blocksActivelyAllocated=0, exceptionsInMultipartFinalize=0, 
> transferDuration=0 ms, queueDuration=0 ms, averageQueueTime=0 ms, 
> totalUploadDuration=0 ms, effectiveBandwidth=0.0 bytes/s}{code}
> details see logs attached



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to