[
https://issues.apache.org/jira/browse/HADOOP-19863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18080781#comment-18080781
]
ASF GitHub Bot commented on HADOOP-19863:
-----------------------------------------
hadoop-yetus commented on PR #8496:
URL: https://github.com/apache/hadoop/pull/8496#issuecomment-4445570417
:confetti_ball: **+1 overall**
| Vote | Subsystem | Runtime | Logfile | Comment |
|:----:|----------:|--------:|:--------:|:-------:|
| +0 :ok: | reexec | 1m 1s | | Docker mode activated. |
|||| _ Prechecks _ |
| +1 :green_heart: | dupname | 0m 0s | | No case conflicting files
found. |
| +0 :ok: | codespell | 0m 0s | | codespell was not available. |
| +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available.
|
| +1 :green_heart: | @author | 0m 0s | | The patch does not contain
any @author tags. |
| +1 :green_heart: | test4tests | 0m 0s | | The patch appears to
include 2 new or modified test files. |
|||| _ branch-3.5 Compile Tests _ |
| +1 :green_heart: | mvninstall | 42m 58s | | branch-3.5 passed |
| +1 :green_heart: | compile | 15m 55s | | branch-3.5 passed with JDK
Ubuntu-21.0.10+7-Ubuntu-124.04 |
| +1 :green_heart: | compile | 16m 14s | | branch-3.5 passed with JDK
Ubuntu-17.0.18+8-Ubuntu-124.04.1 |
| +1 :green_heart: | checkstyle | 1m 30s | | branch-3.5 passed |
| +1 :green_heart: | mvnsite | 1m 56s | | branch-3.5 passed |
| +1 :green_heart: | javadoc | 1m 30s | | branch-3.5 passed with JDK
Ubuntu-21.0.10+7-Ubuntu-124.04 |
| +1 :green_heart: | javadoc | 1m 26s | | branch-3.5 passed with JDK
Ubuntu-17.0.18+8-Ubuntu-124.04.1 |
| +1 :green_heart: | spotbugs | 3m 10s | | branch-3.5 passed |
| +1 :green_heart: | shadedclient | 30m 49s | | branch has no errors
when building and testing our client artifacts. |
|||| _ Patch Compile Tests _ |
| +1 :green_heart: | mvninstall | 1m 11s | | the patch passed |
| +1 :green_heart: | compile | 15m 19s | | the patch passed with JDK
Ubuntu-21.0.10+7-Ubuntu-124.04 |
| +1 :green_heart: | javac | 15m 19s | | the patch passed |
| +1 :green_heart: | compile | 16m 25s | | the patch passed with JDK
Ubuntu-17.0.18+8-Ubuntu-124.04.1 |
| +1 :green_heart: | javac | 16m 25s | | the patch passed |
| +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks
issues. |
| +1 :green_heart: | checkstyle | 1m 26s | | the patch passed |
| +1 :green_heart: | mvnsite | 1m 55s | | the patch passed |
| +1 :green_heart: | javadoc | 1m 28s | | the patch passed with JDK
Ubuntu-21.0.10+7-Ubuntu-124.04 |
| +1 :green_heart: | javadoc | 1m 28s | | the patch passed with JDK
Ubuntu-17.0.18+8-Ubuntu-124.04.1 |
| +1 :green_heart: | spotbugs | 3m 22s | | the patch passed |
| +1 :green_heart: | shadedclient | 30m 51s | | patch has no errors
when building and testing our client artifacts. |
|||| _ Other Tests _ |
| +1 :green_heart: | unit | 22m 58s | | hadoop-common in the patch
passed. |
| +1 :green_heart: | asflicense | 1m 16s | | The patch does not
generate ASF License warnings. |
| | | 215m 24s | | |
| Subsystem | Report/Notes |
|----------:|:-------------|
| Docker | ClientAPI=1.54 ServerAPI=1.54 base:
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8496/1/artifact/out/Dockerfile
|
| GITHUB PR | https://github.com/apache/hadoop/pull/8496 |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
| uname | Linux 775a870acedd 5.15.0-173-generic #183-Ubuntu SMP Fri Mar 6
13:29:34 UTC 2026 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | dev-support/bin/hadoop.sh |
| git revision | branch-3.5 / aa7c7f73a70306d1a52a1cce4142921992dc0758 |
| Default Java | Ubuntu-17.0.18+8-Ubuntu-124.04.1 |
| Multi-JDK versions |
/usr/lib/jvm/java-21-openjdk-amd64:Ubuntu-21.0.10+7-Ubuntu-124.04
/usr/lib/jvm/java-17-openjdk-amd64:Ubuntu-17.0.18+8-Ubuntu-124.04.1 |
| Test Results |
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8496/1/testReport/ |
| Max. process+thread count | 1285 (vs. ulimit of 5500) |
| modules | C: hadoop-common-project/hadoop-common U:
hadoop-common-project/hadoop-common |
| Console output |
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8496/1/console |
| versions | git=2.43.0 maven=3.9.15 spotbugs=4.9.7 |
| Powered by | Apache Yetus 0.14.1 https://yetus.apache.org |
This message was automatically generated.
> Incorrect Vectored IO metrics from Local Filesystem
> ---------------------------------------------------
>
> Key: HADOOP-19863
> URL: https://issues.apache.org/jira/browse/HADOOP-19863
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs
> Affects Versions: 3.5.0
> Reporter: Peter Toth
> Assignee: Steve Loughran
> Priority: Major
> Labels: pull-request-available
> Fix For: 3.6.0
>
> Attachments: Screenshot 2026-04-16 at 19.02.30.png, Screenshot
> 2026-04-16 at 19.03.51.png
>
>
> As discussed in
> [https://github.com/apache/parquet-java/issues/2703#issuecomment-4260121705]
> we noticed that when vectoried IO is enabled the {{BytesRead}} metrics of
> Spark tasks are not correct.
> Spark fetches that metric via {{FileSystem.getAllStatistics}} see
> -
> [https://github.com/apache/spark/blob/5d491f62748b4b9c34bc3b5bd7390f7b5ca75053/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileScanRDD.scala#L98-L109]
> and
> -
> [https://github.com/apache/spark/blob/5d491f62748b4b9c34bc3b5bd7390f7b5ca75053/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L164-L170]
> Repro with latest Spark 4.2.0-SNAPSHOT using Hadoop 3.5.0:
> Vectored IO is enabled by default:
> {code:java}
> ➜ bin/spark-shell
> scala> spark.createDataFrame((0 until 5000).map(i => (i,
> s"left_$i"))).repartition(1).write.parquet("/tmp/t2")
> scala> spark.read.parquet("/tmp/t2").createOrReplaceTempView("t2")
> scala> sql("SELECT * FROM t2").collect()
> {code}
> !Screenshot 2026-04-16 at 19.02.30.png|width=85%!
> Vectored IO is disabled explicitely:
> {code:java}
> ➜ bin/spark-shell --conf
> spark.hadoop.parquet.hadoop.vectored.io.enabled=false
> scala> spark.read.parquet("/tmp/t2").createOrReplaceTempView("t2")
> scala> sql("SELECT * FROM t2").collect()
> {code}
> !Screenshot 2026-04-16 at 19.03.51.png|width=85%!
> In my case the generated test file size was ~45KB:
> {code:java}
> ➜ ls -ll /tmp/t2
> total 88
> -rw-r--r--@ 1 ptoth wheel 0 Apr 16 18:57 _SUCCESS
> -rw-r--r--@ 1 ptoth wheel 44944 Apr 16 18:57
> part-00000-cf825cf6-2fa5-46a2-b897-dbb9dc9828a7-c000.snappy.parquet{code}
> I believe reading the parquet footers don't go through vectored IO so the
> decreased 1680B probably belongs to that.
> There is no data pruning in the query so the metric value should be around
> the file size.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]