HI James, I do think it might be time to start considering creating a wiki of breaking changes for a Drill 2.0. I'd also concur that having tons of config options that don't really add value is not a good use of config options as it leads to the creation of a lot of technical debt. I'll start a wiki page and put this on there.
In the mean time, I may submit a PR that changes the default value of extractHeaders for CSV to true. I don't really see that as a breaking change in that a user can simply change that flag and the previous behavior is restored. Best, -- C > On Nov 18, 2021, at 2:34 AM, James Turton <[email protected]> wrote: > > Definitely a +1 for this friendlier default behaviour and another +1 for the > prospect of increased consistency across format plugins. > > My follow-up questions to the community. > Since these are examples of user-breaking changes, and not just in niche > areas, are we approaching a point when we want to start working on Drill 2.x? > Do we have other user-breaking or significant refactoring ideas that we've > been keeping stashed away in our heads, that would get their chance at life > from the fact that a 2.x Drill can defensibly exhibit some incompatibilities > with Drill 1.x? > Should we make a "Drill v2 Parking Lot" page in the Dev Wiki where we record > such ideas? > Would we be fine in terms of dev resources with supporting both bug fix > releases to a 1.x series and also pushing forward in a 2.x series? > My own feeling is that to get the most value from a good proposal such as the > below, we don't want to conceal everything behind default-false options in > order to avoid breaking Drill 1.x users, we want to embrace the breakage > which (to me) points to Drill 2.x. > > On 2021/11/18 02:30, Charles Givre wrote: >> Hello Drill Community, >> I would like to put forward some thoughts I've had relating to the CSV >> reader in Drill. I would like to propose a few changes which could actually >> be breaking changes, so I wanted to see if there are any strongly held >> opinions in the community. Here goes: >> >> The Problems: >> 1. The default behavior for Drill is to leave the extractColumnHeaders >> option as false. When a user queries a CSV file this way, the results are >> returned in a list of columns called columns. Thus if a user wants the >> first column, they would project columns[0]. I have never been a fan of >> this behavior. Even though Drill ships with the csvh file extension which >> enables the header extraction, this is not a commonly used file format. >> Furthermore, the returned results (the column list) does not work well with >> BI tools. >> >> 2. The CSV reader does not attempt to do any kind of data type discovery. >> >> Proposed Changes: >> The overall goal is to make it easier to query CSV data and also to make the >> behavior more consistent across format plugins. >> 1. Change the default behavior and set the extractHeaders to true. >> 2. Other formats, like the excel reader, read tables directly into columns. >> If the header is not known, Drill assigns a name of field_n. I would >> propose replacing the `columns` array with a model similar to the Excel >> reader. >> 3. Implement schema discovery (data types) with an allTextMode option >> similar to the JSON reader. When the allTextMode is disabled, the CSV >> reader would attempt to infer data types. >> >> Since there are some breaking changes here, I'd like to ask if people have >> any strong feelings on this topic or suggestions. >> Thanks!, >> -- C >> >> >> > > <dzamo.vcf>
