[ https://issues.apache.org/jira/browse/SPARK-35640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
DB Tsai resolved SPARK-35640. ----------------------------- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 32777 [https://github.com/apache/spark/pull/32777] > Refactor Parquet vectorized reader to remove duplicated code paths > ------------------------------------------------------------------ > > Key: SPARK-35640 > URL: https://issues.apache.org/jira/browse/SPARK-35640 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.2.0 > Reporter: Chao Sun > Priority: Major > Fix For: 3.2.0 > > > Currently in Parquet vectorized code path, there are many code duplications > such as the following: > {code:java} > public void readIntegers( > int total, > WritableColumnVector c, > int rowId, > int level, > VectorizedValuesReader data) throws IOException { > int left = total; > while (left > 0) { > if (this.currentCount == 0) this.readNextGroup(); > int n = Math.min(left, this.currentCount); > switch (mode) { > case RLE: > if (currentValue == level) { > data.readIntegers(n, c, rowId); > } else { > c.putNulls(rowId, n); > } > break; > case PACKED: > for (int i = 0; i < n; ++i) { > if (currentBuffer[currentBufferIdx++] == level) { > c.putInt(rowId + i, data.readInteger()); > } else { > c.putNull(rowId + i); > } > } > break; > } > rowId += n; > left -= n; > currentCount -= n; > } > } > {code} > This makes it hard to maintain as any change on this will need to be > replicated in 20+ places. The issue becomes more serious when we are going to > implement column index and complex type support for the vectorized path. > The original intention is for performance. However now days JIT compilers > tend to be smart on this and will inline virtual calls as much as possible. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org