Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/1057 Thanks @ilooner for doing this fix! A bit of background. A user provided a "short, fat" query that had, as I recall, only 6000 rows but something like 200 columns. The numbers I posted were for the cost of doing the code generation for the rather large code (200 columns to be copied), amortized against a very short query. As @ilooner said, in this special case, there are no advantages to grinding the code generator, the cache lookup mechanism, the compiler and the byte-code merge mechanism, when a simple function call does the same job. One could argue that for the same query with 1 TB of data, the results may be different. I didn't run that test. But, even a cursory look at the two code paths shows that the copier path is still more efficient, just looking at per-row costs, than the generated code. The reason is simple: the generated code to copy a value is a degenerate special case of code designed to fetch values, compute results, and create new columns, perhaps doing some casts along the way. The generated copy code is the fetch step and the save step with the compute part omitted. For this reason, custom-built copier code will be more efficient than generic load-compute-save code. Still, would be great for someone to run a test at scale to verify this reasoning.
---