Github user paul-rogers commented on the issue:

    https://github.com/apache/drill/pull/1057
  
    Thanks @ilooner for doing this fix!
    
    A bit of background. A user provided a "short, fat" query that had, as I 
recall, only 6000 rows but something like 200 columns. The numbers I posted 
were for the cost of doing the code generation for the rather large code (200 
columns to be copied), amortized against a very short query. As @ilooner said, 
in this special case, there are no advantages to grinding the code generator, 
the cache lookup mechanism, the compiler and the byte-code merge mechanism, 
when a simple function call does the same job.
    
    One could argue that for the same query with 1 TB of data, the results may 
be different. I didn't run that test. But, even a cursory look at the two code 
paths shows that the copier path is still more efficient, just looking at 
per-row costs, than the generated code.
    
    The reason is simple: the generated code to copy a value is a degenerate 
special case of code designed to fetch values, compute results, and create new 
columns, perhaps doing some casts along the way. The generated copy code is the 
fetch step and the save step with the compute part omitted.
    
    For this reason, custom-built copier code will be more efficient than 
generic load-compute-save code.
    
    Still, would be great for someone to run a test at scale to verify this 
reasoning.


---

Reply via email to