Yahui Liu created SPARK-35500:
---------------------------------

             Summary: GenerateSafeProjection.generate will generate 
SpecificSafeProjection class, but if column is array type or map type, the code 
cannot be reused which impact the query performance
                 Key: SPARK-35500
                 URL: https://issues.apache.org/jira/browse/SPARK-35500
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.1.0
            Reporter: Yahui Liu


Reproduce steps:
 # create a new table with array type: create table test_code_gen(a array<int>);
 # Add 
log4j.logger.org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator = 
DEBUG to log4j.properties;
 # Enter spark-shell, fire a query: spark.sql("select * from 
test_code_gen").collect
 # Everytime, Dataset.collect is called, SpecificSafeProjection class is 
generated, but the code for the class cannot be reused because everytime the id 
for two variables in the generated class is changed: MapObjects_loopValue and 
MapObjects_loopIsNull. So even the class generated before has been cached, new 
code cannot match the cache key so that new code need to be compiled again 
which cost some time.  
!image-2021-05-24-16-15-18-359.png!!image-2021-05-24-16-05-34-334.png!
 # The time cost for compile is increasing with the growth of column number, 
for wide table, this cost can more than 2s. !image-2021-05-24-16-11-20-841.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to