[GitHub] drill issue #1060: DRILL-5846: Improve parquet performance for Flat Data Typ...

parthchandra Fri, 30 Mar 2018 02:03:07 -0700

Github user parthchandra commented on the issue:

    https://github.com/apache/drill/pull/1060
  
    I feel putting this PR in without finalizing DRILL-6301 is putting the cart 
before the horse. (BTW, it would help the discussion if the benchmarks were 
published !). My observation based on profiling I did sometime back is that the 
performance gains seen here are roughly in line with removing bounds checks. 
Paul has seen similar gains in the batch sizing project.
    Which takes us back to the question, raised by Paul in his first comment, 
of how we want to reconcile batch sizing and vectorizing of scans; a question 
we have deferred. If removing bounds checks gets us the same performance gains, 
then why not would put our efforts in implementing batch sizing with the 
accompanying elimination in bounds checking. 
    I'm mostly not in favor of having MemoryUtils unless you make a compelling 
argument that it is the only way to save the planet (i.e get the performance 
you want). I feel operators should not establish the pattern of accessing 
memory directly. So far, I'm -0 on this as my arguments are mostly high level 
(and somewhat philosophical). 
    Minor nitpick - The prefix VL is not as informative as say, VarLen or 
VariableLength.

---

[GitHub] drill issue #1060: DRILL-5846: Improve parquet performance for Flat Data Typ...

Reply via email to