I am in support of opening JIRAs to enumerate the step necessary to fill in the steps necessary to support these types. However I think it would be good to get a fix into master for the functional bug that is in the code today. That fix is easy and the only overhead is taking a little more space for the data after it has been read into Drill.
As we are looking to keep up with our near-monthly release schedule, I'm uncertain that we can have these types implemented and well tested by the next release, but I think we very realistically could start testing Hive more thoroughly after this small fix. On Mon, Jun 8, 2015 at 12:29 PM, Jacques Nadeau <[email protected]> wrote: > I think it would be worthwhile to first open up a set of JIRAs associated > with finishing support for these datatypes. I'm guessing the scale of > effort is less than one might initially guess. Once those are opened, it > would be easier to give feedback on the relative merit of that work versus > the alternative solution you suggested. > > On Mon, Jun 8, 2015 at 11:12 AM, Jason Altekruse <[email protected] > > > wrote: > > > Hello Drillers, > > > > I have been working on DRILL-3209, which aims to speed up reading from > hive > > tables by re-planning them as native Drill reads in the case where the > > tables are backed by files that have available native readers. This will > > begin with parquet and delimited text files. > > > > To provide the same behavior as reading through the Serde interface, I > must > > insert a cast above the read operation to provide the same types that the > > Hive scan otherwise would. > > > > The issue I am seeing is that Hive appears to be reading into both the > > tinyint and smallint types which I believe are not fully supported > > (currently my new injected project is failing to find a function to cast > to > > tinyint). See the unsupported note in the docs here [1] for smallint, > > tinyint is not even listed. > > > > I can simply add the function to provide the same type as we currently > read > > out of the scan, but I believe we will have other issues with trying to > > support this right now as we have not thoroughly tested these other > integer > > types. > > > > I would like to instead propose that we change the behavior of Hive to > read > > data of these types into a regular integer columns for now and try to > > remove any outstanding references to tinyint and smallint until we can > > commit to fully supporting them. > > > > [1] http://drill.apache.org/docs/supported-data-types/ > > >
