Please exclude me from conversation чт, 18 нояб. 2021 г., 13:30 Charles Givre <[email protected]>:
> HI James, > I do think it might be time to start considering creating a wiki of > breaking changes for a Drill 2.0. I'd also concur that having tons of > config options that don't really add value is not a good use of config > options as it leads to the creation of a lot of technical debt. I'll start > a wiki page and put this on there. > > In the mean time, I may submit a PR that changes the default value of > extractHeaders for CSV to true. I don't really see that as a breaking > change in that a user can simply change that flag and the previous behavior > is restored. > Best, > -- C > > > > > On Nov 18, 2021, at 2:34 AM, James Turton <[email protected]> wrote: > > > > Definitely a +1 for this friendlier default behaviour and another +1 for > the prospect of increased consistency across format plugins. > > > > My follow-up questions to the community. > > Since these are examples of user-breaking changes, and not just in niche > areas, are we approaching a point when we want to start working on Drill > 2.x? > > Do we have other user-breaking or significant refactoring ideas that > we've been keeping stashed away in our heads, that would get their chance > at life from the fact that a 2.x Drill can defensibly exhibit some > incompatibilities with Drill 1.x? > > Should we make a "Drill v2 Parking Lot" page in the Dev Wiki where we > record such ideas? > > Would we be fine in terms of dev resources with supporting both bug fix > releases to a 1.x series and also pushing forward in a 2.x series? > > My own feeling is that to get the most value from a good proposal such > as the below, we don't want to conceal everything behind default-false > options in order to avoid breaking Drill 1.x users, we want to embrace the > breakage which (to me) points to Drill 2.x. > > > > On 2021/11/18 02:30, Charles Givre wrote: > >> Hello Drill Community, > >> I would like to put forward some thoughts I've had relating to the CSV > reader in Drill. I would like to propose a few changes which could > actually be breaking changes, so I wanted to see if there are any strongly > held opinions in the community. Here goes: > >> > >> The Problems: > >> 1. The default behavior for Drill is to leave the extractColumnHeaders > option as false. When a user queries a CSV file this way, the results are > returned in a list of columns called columns. Thus if a user wants the > first column, they would project columns[0]. I have never been a fan of > this behavior. Even though Drill ships with the csvh file extension which > enables the header extraction, this is not a commonly used file format. > Furthermore, the returned results (the column list) does not work well with > BI tools. > >> > >> 2. The CSV reader does not attempt to do any kind of data type > discovery. > >> > >> Proposed Changes: > >> The overall goal is to make it easier to query CSV data and also to > make the behavior more consistent across format plugins. > >> 1. Change the default behavior and set the extractHeaders to true. > >> 2. Other formats, like the excel reader, read tables directly into > columns. If the header is not known, Drill assigns a name of field_n. I > would propose replacing the `columns` array with a model similar to the > Excel reader. > >> 3. Implement schema discovery (data types) with an allTextMode option > similar to the JSON reader. When the allTextMode is disabled, the CSV > reader would attempt to infer data types. > >> > >> Since there are some breaking changes here, I'd like to ask if people > have any strong feelings on this topic or suggestions. > >> Thanks!, > >> -- C > >> > >> > >> > > > > <dzamo.vcf> > >
