[ 
https://issues.apache.org/jira/browse/HADOOP-19557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17948510#comment-17948510
 ] 

ASF GitHub Bot commented on HADOOP-19557:
-----------------------------------------

steveloughran commented on code in PR #7662:
URL: https://github.com/apache/hadoop/pull/7662#discussion_r2068923434


##########
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3ABlockOutputStream.java:
##########
@@ -829,7 +829,8 @@ public boolean hasCapability(String capability) {
   @Override
   public void hflush() throws IOException {
     statistics.hflushInvoked();
-    handleSyncableInvocation();
+    // do not reject these, but downgrade to a no-oop
+    LOG.debug("Hflush invoked");

Review Comment:
   look at the fs spec. we say "don't use the api and highlight the 
inconsistent outcomes"
   
   The semantics of hflush say "visible to all" but no persistence, so it's not 
changing any durability semantics. Are we changing the visibility? we're 
certainly not meeting them. 
   
   I remember having a long talk with others about hflush, as in "what does it 
do?" -the answer is "nothing you can rely on". 
   
   when exceptions are downgraded (default) all that happens is the log message 
is removed, so reducing confusion.
   
   when exceptions are rejected, the failure goes away. The one I want to fail 
here is hsync(), and at holds. AFAIK nobody runs with that flag on except for 
some of our test setups. 
   
   --

> S3A: S3ABlockOutputStream to never log/reject hflush(): calls
> -------------------------------------------------------------
>
>                 Key: HADOOP-19557
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19557
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.4.1
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Critical
>              Labels: pull-request-available
>
> Parquet's GH-3204 patch uses hflush() just before close()
> this is needless and hurts write performance on hdfs.
> For s3A it will trigger a warning long (Syncable is not supported) or an 
> actual failure if
> fs.s3a.downgrade.syncable.exceptions is false
> proposed: hflush to log at debug -only log/reject on hsync, which is the real 
> place where semantics cannot be met



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to