While it is true that there is code complexity due to the required type,
what would we be trading off ?  some important considerations:
  - We don't currently have null count statistics which would need to be
implemented for various data sources
  - Primary keys in the RDBMS sources (or rowkeys in hbase) are always
non-null, and although today we may not be doing optimizations to leverage
that,  one could easily add a rule that converts  WHERE primary_key IS NULL
to a FALSE filter.


On Tue, Mar 22, 2016 at 7:31 AM, Dave Oshinsky <[email protected]>
wrote:

> Hi Jacques,
> Marginally related to this, I made a small change in PR-372 (DRILL-4184)
> to support variable widths for decimal quantities in Parquet.  I found the
> (decimal) vectoring code to be very difficult to understand (probably
> because it's overly complex, but also because I'm new to Drill code in
> general), so I made a small, surgical change in my pull request to support
> keeping track of variable widths (lengths) and null booleans within the
> existing fixed width decimal vectoring scheme.  Can my changes be
> reviewed/accepted, and then we discuss how to fix properly long-term?
>
> Thanks,
> Dave Oshinsky
>
> -----Original Message-----
> From: Jacques Nadeau [mailto:[email protected]]
> Sent: Monday, March 21, 2016 11:43 PM
> To: dev
> Subject: Re: [DISCUSS] Remove required type
>
> Definitely in support of this. The required type is a huge maintenance and
> code complexity nightmare that provides little to no benefit. As you point
> out, we can do better performance optimizations though null count
> observation since most sources are nullable anyway.
> On Mar 21, 2016 7:41 PM, "Steven Phillips" <[email protected]> wrote:
>
> > I have been thinking about this for a while now, and I feel it would
> > be a good idea to remove the Required vector types from Drill, and
> > only use the Nullable version of vectors. I think this will greatly
> simplify the code.
> > It will also simplify the creation of UDFs. As is, if a function has
> > custom null handling (i.e. INTERNAL), the function has to be
> > separately implemented for each permutation of nullability of the
> > inputs. But if drill data types are always nullable, this wouldn't be a
> problem.
> >
> > I don't think there would be much impact on performance. In practice,
> > I think the required type is used very rarely. And there are other
> > ways we can optimize for when a column is known to have no nulls.
> >
> > Thoughts?
> >
>
>
>
> ***************************Legal Disclaimer***************************
> "This communication may contain confidential and privileged material for
> the
> sole use of the intended recipient. Any unauthorized review, use or
> distribution
> by others is strictly prohibited. If you have received the message by
> mistake,
> please advise the sender by reply email and delete the message. Thank you."
> **********************************************************************

Reply via email to