[
https://issues.apache.org/jira/browse/HADOOP-17833?focusedWorklogId=779621&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-779621
]
ASF GitHub Bot logged work on HADOOP-17833:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 08/Jun/22 18:50
Start Date: 08/Jun/22 18:50
Worklog Time Spent: 10m
Work Description: hadoop-yetus commented on PR #3289:
URL: https://github.com/apache/hadoop/pull/3289#issuecomment-1150274341
:broken_heart: **-1 overall**
| Vote | Subsystem | Runtime | Logfile | Comment |
|:----:|----------:|--------:|:--------:|:-------:|
| +0 :ok: | reexec | 0m 0s | | Docker mode activated. |
| -1 :x: | patch | 0m 24s | |
https://github.com/apache/hadoop/pull/3289 does not apply to trunk. Rebase
required? Wrong Branch? See
https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute for help.
|
| Subsystem | Report/Notes |
|----------:|:-------------|
| GITHUB PR | https://github.com/apache/hadoop/pull/3289 |
| Console output |
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3289/30/console |
| versions | git=2.17.1 |
| Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
This message was automatically generated.
Issue Time Tracking
-------------------
Worklog Id: (was: 779621)
Time Spent: 11h (was: 10h 50m)
> Improve Magic Committer Performance
> -----------------------------------
>
> Key: HADOOP-17833
> URL: https://issues.apache.org/jira/browse/HADOOP-17833
> Project: Hadoop Common
> Issue Type: Improvement
> Components: fs/s3
> Affects Versions: 3.3.1
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Priority: Minor
> Labels: pull-request-available
> Time Spent: 11h
> Remaining Estimate: 0h
>
> Magic committer tasks can be slow because every file created with
> overwrite=false triggers a HEAD (verify there's no file) and a LIST (that
> there's no dir). And because of delayed manifestations, it may not behave as
> expected.
> ParquetOutputFormat is one example of a library which does this.
> we could fix parquet to use overwrite=true, but (a) there may be surprises in
> other uses (b) it'd still leave the list and (c) do nothing for other formats
> call
> Proposed: createFile() under a magic path to skip all probes for file/dir at
> end of path
> Only a single task attempt Will be writing to that directory and it should
> know what it is doing. If there is conflicting file names and parts across
> tasks that won't even get picked up at this point. Oh and none of the
> committers ever check for this: you'll get the last file manifested (s3a) or
> renamed (file)
> If we skip the checks we will save 2 HTTP requests/file.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]