Hsuan was kind enough to put together a provocative discussion on the mailing list about skipping records. I've started a way too long thread in the comments discussion but would like to get other feedback from the community. The main point of contention I have is that the big goal of this design is to provide "data import" like capabilities for Drill. In that context, I suggested a scan based approach to schema enforcement (and bad record capture/storage). I think it is a simpler approach and solves the vast majority of user needs. Hsuan's initial proposal was a much broader reaching proposal that supports an arbitrary number of expression types within project and filter (assuming they are proximate to the scan).
Would love to get others feedback and thoughts on the doc to what the MVP for this feature really is. https://docs.google.com/document/d/1jCeYW924_SFwf-nOqtXrO68eixmAitM-tLngezzXw3Y/edit -- Jacques Nadeau CTO and Co-Founder, Dremio
