I think there is a reasonable amount of consensus and I am going to incorporate the proposed change in my JSON changes to improve the error reporting (better error message on scalar to complex type schema change and directing users to user all_text_mode for any other schema change i.e. number->string, bool->number, etc). I do agree that the default should change once we support embedded type, but for now I think this will provide a much better user experience for a small amount of dev effort.
-Jason On Mon, Jan 26, 2015 at 3:14 PM, Ted Dunning <[email protected]> wrote: > I think that reading all as doubles is fine as an interim step. This will > work for very large numbers, but has the traditional problems with very > large financial values, but I think that we aren't worried much yet about > people talking about amounts > $10^17. > > > > On Mon, Jan 26, 2015 at 5:17 PM, Jacques Nadeau <[email protected]> > wrote: > > > Writing zero int to a float column should be allowed. Basically, if we > > found a float previously and then we run across a zero, that should be > > accepted. This doesn't fix the situation where the first value was zero > > but definitely fixes many situations. I'm up for a second option to > treat > > all numbers as doubles but I'm not in support of it for the default as > once > > we finish embedded types, this would be our desired behavior. > > > > On Mon, Jan 26, 2015 at 1:36 PM, Jason Altekruse < > [email protected] > > > > > wrote: > > > > > Hello Drillers, > > > > > > I am currently working on improving the error reporting in the JSON > > reader > > > to help users with files that Drill cannot read using the default > > > configuration today. > > > > > > As a part of this change I think it may be useful to change the default > > > behavior for reading numbers in JSON documents. Currently we fail on a > > > simple case with reading numbers with decimal points and then hit a > value > > > of 0 (or any number without a decimal point) in a later record. The > > reason > > > for the current behavior is to allow better precision in the case of > > files > > > with only integers. The issue however is that we currently fail on the > > > basic case with a mix of intergers and decimal numbers. See [1] for > more > > > discussion on this. > > > > > > I propose that we switch the JSON reader to read all numbers as doubles > > by > > > default. The reader already contains a workaround that allows lossless > > > casting to integers and decimal types with some extra computational > > > overhead using all_text_mode, see more info below. [2] > > > > > > Please share your thoughts on this change. > > > > > > [1] https://issues.apache.org/jira/browse/DRILL-1460 > > > [2] https://issues.apache.org/jira/browse/DRILL-2071 > > > > > > -Jason > > > > > >
