I was thinking about it more after sending the previous concerns. Agree, this is an execution side change...but some details need to be worked out. If the planner indicates to the executor that a column is non-nullable (e.g a primary key), the run-time generated code is more efficient since it does not have to check the null bit. Are you thinking we would use the existing nullable vector and add some additional metadata (at a record batch level rather than record level) to indicate non-nullability ?
On Tue, Mar 22, 2016 at 12:27 PM, Jacques Nadeau <jacq...@dremio.com> wrote: > Hey Aman, I believe both Steven and I were only suggesting removal only > from execution, not planning. It seems like your concerns are all related > to planning. Iit seems like the real tradeoffs in execution are nominal. > On Mar 22, 2016 9:03 AM, "Aman Sinha" <amansi...@apache.org> wrote: > > > While it is true that there is code complexity due to the required type, > > what would we be trading off ? some important considerations: > > - We don't currently have null count statistics which would need to be > > implemented for various data sources > > - Primary keys in the RDBMS sources (or rowkeys in hbase) are always > > non-null, and although today we may not be doing optimizations to > leverage > > that, one could easily add a rule that converts WHERE primary_key IS > NULL > > to a FALSE filter. > > > > > > On Tue, Mar 22, 2016 at 7:31 AM, Dave Oshinsky <doshin...@commvault.com> > > wrote: > > > > > Hi Jacques, > > > Marginally related to this, I made a small change in PR-372 > (DRILL-4184) > > > to support variable widths for decimal quantities in Parquet. I found > > the > > > (decimal) vectoring code to be very difficult to understand (probably > > > because it's overly complex, but also because I'm new to Drill code in > > > general), so I made a small, surgical change in my pull request to > > support > > > keeping track of variable widths (lengths) and null booleans within the > > > existing fixed width decimal vectoring scheme. Can my changes be > > > reviewed/accepted, and then we discuss how to fix properly long-term? > > > > > > Thanks, > > > Dave Oshinsky > > > > > > -----Original Message----- > > > From: Jacques Nadeau [mailto:jacq...@dremio.com] > > > Sent: Monday, March 21, 2016 11:43 PM > > > To: dev > > > Subject: Re: [DISCUSS] Remove required type > > > > > > Definitely in support of this. The required type is a huge maintenance > > and > > > code complexity nightmare that provides little to no benefit. As you > > point > > > out, we can do better performance optimizations though null count > > > observation since most sources are nullable anyway. > > > On Mar 21, 2016 7:41 PM, "Steven Phillips" <ste...@dremio.com> wrote: > > > > > > > I have been thinking about this for a while now, and I feel it would > > > > be a good idea to remove the Required vector types from Drill, and > > > > only use the Nullable version of vectors. I think this will greatly > > > simplify the code. > > > > It will also simplify the creation of UDFs. As is, if a function has > > > > custom null handling (i.e. INTERNAL), the function has to be > > > > separately implemented for each permutation of nullability of the > > > > inputs. But if drill data types are always nullable, this wouldn't > be a > > > problem. > > > > > > > > I don't think there would be much impact on performance. In practice, > > > > I think the required type is used very rarely. And there are other > > > > ways we can optimize for when a column is known to have no nulls. > > > > > > > > Thoughts? > > > > > > > > > > > > > > > > ***************************Legal Disclaimer*************************** > > > "This communication may contain confidential and privileged material > for > > > the > > > sole use of the intended recipient. Any unauthorized review, use or > > > distribution > > > by others is strictly prohibited. If you have received the message by > > > mistake, > > > please advise the sender by reply email and delete the message. Thank > > you." > > > ********************************************************************** > > >