Repository: spark Updated Branches: refs/heads/branch-2.3 a1c56b669 -> 5bcb7bdcc
[SPARK-23963][SQL] Properly handle large number of columns in query on text-based Hive table ## What changes were proposed in this pull request? TableReader would get disproportionately slower as the number of columns in the query increased. I fixed the way TableReader was looking up metadata for each column in the row. Previously, it had been looking up this data in linked lists, accessing each linked list by an index (column number). Now it looks up this data in arrays, where indexing by column number works better. ## How was this patch tested? Manual testing All sbt unit tests python sql tests Author: Bruce Robbins <bersprock...@gmail.com> Closes #21043 from bersprockets/tabreadfix. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5bcb7bdc Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5bcb7bdc Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5bcb7bdc Branch: refs/heads/branch-2.3 Commit: 5bcb7bdccf967ff5ad3d8c76f4ad8c9c4031e7c2 Parents: a1c56b6 Author: Bruce Robbins <bersprock...@gmail.com> Authored: Fri Apr 13 14:05:04 2018 -0700 Committer: gatorsmile <gatorsm...@gmail.com> Committed: Wed Apr 18 09:48:49 2018 -0700 ---------------------------------------------------------------------- .../src/main/scala/org/apache/spark/sql/hive/TableReader.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/spark/blob/5bcb7bdc/sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala ---------------------------------------------------------------------- diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala index cc8907a..b5444a4 100644 --- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala +++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala @@ -381,7 +381,7 @@ private[hive] object HadoopTableReader extends HiveInspectors with Logging { val (fieldRefs, fieldOrdinals) = nonPartitionKeyAttrs.map { case (attr, ordinal) => soi.getStructFieldRef(attr.name) -> ordinal - }.unzip + }.toArray.unzip /** * Builds specific unwrappers ahead of time according to object inspector --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org