[ https://issues.apache.org/jira/browse/PARQUET-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17697862#comment-17697862 ]
ASF GitHub Bot commented on PARQUET-2202: ----------------------------------------- wgtmac commented on code in PR #1035: URL: https://github.com/apache/parquet-mr/pull/1035#discussion_r1129335329 ########## parquet-column/src/main/java/org/apache/parquet/column/ParquetProperties.java: ########## @@ -477,15 +477,15 @@ public Builder withMaxBloomFilterBytes(int maxBloomFilterBytes) { * @return this builder for method chaining */ public Builder withBloomFilterNDV(String columnPath, long ndv) { - Preconditions.checkArgument(ndv > 0, "Invalid NDV for column \"%s\": %d", columnPath, ndv); + Preconditions.checkArgument(ndv > 0, "Invalid NDV for column \"%s\": %s", columnPath, ndv); this.bloomFilterNDVs.withValue(columnPath, ndv); // Setting an NDV for a column implies writing a bloom filter this.bloomFilterEnabled.withValue(columnPath, true); return this; } public Builder withBloomFilterFPP(String columnPath, double fpp) { - Preconditions.checkArgument(fpp > 0.0 && fpp < 1.0, "Invalid FPP for column \"%s\": %d", columnPath, fpp); + Preconditions.checkArgument(fpp > 0.0 && fpp < 1.0, "Invalid FPP for column \"%s\": %s", columnPath, fpp); Review Comment: Thanks for the explanation. SGTM. > Redundant String allocation on the hot path in > CapacityByteArrayOutputStream.setByte > ------------------------------------------------------------------------------------ > > Key: PARQUET-2202 > URL: https://issues.apache.org/jira/browse/PARQUET-2202 > Project: Parquet > Issue Type: Bug > Components: parquet-mr > Affects Versions: 1.12.3 > Reporter: Andrei Pangin > Priority: Major > Labels: performance > Attachments: profile-alloc.png, profile-cpu.png > > > Profiling of a Spark application revealed a performance issue in production: > {{CapacityByteArrayOutputStream.setByte}} consumed 2.2% of total CPU time and > made up 4.6% of total allocations. However, in normal case, this method > should allocate nothing at all. > Here is an excerpt from async-profiler report. > CPU profile: > !profile-cpu.png|width=560! > Allocation profile: > !profile-alloc.png|width=560! > The reason is a {{checkArgument()}} call with an unconditionally constructed > dynamic String: > [https://github.com/apache/parquet-mr/blob/62b774cd0f0c60cfbe540bbfa60bee15929af5d4/parquet-common/src/main/java/org/apache/parquet/bytes/CapacityByteArrayOutputStream.java#L303] > The suggested fix is to move String construction under the condition: > {code:java} > if (index >= bytesUsed) { > throw new IllegalArgumentException("Index: " + index + > " is >= the current size of: " + bytesUsed); > }{code} -- This message was sent by Atlassian Jira (v8.20.10#820010)