I was going to be changing them on the schema side as well. As I am currently implementing the feature as a rewrite rule, I have to match the schema of the relational tree I am replacing. To make it work in execution I have to cast to an integer (or add the tinyint cast). If I choose the former, the planning will fail on mismatch types between the tinyint expected from the Hives can that differs from the integer coming out of the cast.
On Mon, Jun 8, 2015 at 12:43 PM, Jacques Nadeau <[email protected]> wrote: > The only concern I have around changing the types in execution is that it > may cause strange behaviors. Are you planning on changing them on the > schema side as well? That way Calcite wouldn't insert weird expression > patterns that would cause other problems if you change the execution side. > > On Mon, Jun 8, 2015 at 12:41 PM, Jason Altekruse <[email protected] > > > wrote: > > > I am in support of opening JIRAs to enumerate the step necessary to fill > in > > the steps necessary to support these types. However I think it would be > > good to get a fix into master for the functional bug that is in the code > > today. That fix is easy and the only overhead is taking a little more > space > > for the data after it has been read into Drill. > > > > As we are looking to keep up with our near-monthly release schedule, I'm > > uncertain that we can have these types implemented and well tested by the > > next release, but I think we very realistically could start testing Hive > > more thoroughly after this small fix. > > > > On Mon, Jun 8, 2015 at 12:29 PM, Jacques Nadeau <[email protected]> > > wrote: > > > > > I think it would be worthwhile to first open up a set of JIRAs > associated > > > with finishing support for these datatypes. I'm guessing the scale of > > > effort is less than one might initially guess. Once those are opened, > it > > > would be easier to give feedback on the relative merit of that work > > versus > > > the alternative solution you suggested. > > > > > > On Mon, Jun 8, 2015 at 11:12 AM, Jason Altekruse < > > [email protected] > > > > > > > wrote: > > > > > > > Hello Drillers, > > > > > > > > I have been working on DRILL-3209, which aims to speed up reading > from > > > hive > > > > tables by re-planning them as native Drill reads in the case where > the > > > > tables are backed by files that have available native readers. This > > will > > > > begin with parquet and delimited text files. > > > > > > > > To provide the same behavior as reading through the Serde interface, > I > > > must > > > > insert a cast above the read operation to provide the same types that > > the > > > > Hive scan otherwise would. > > > > > > > > The issue I am seeing is that Hive appears to be reading into both > the > > > > tinyint and smallint types which I believe are not fully supported > > > > (currently my new injected project is failing to find a function to > > cast > > > to > > > > tinyint). See the unsupported note in the docs here [1] for smallint, > > > > tinyint is not even listed. > > > > > > > > I can simply add the function to provide the same type as we > currently > > > read > > > > out of the scan, but I believe we will have other issues with trying > to > > > > support this right now as we have not thoroughly tested these other > > > integer > > > > types. > > > > > > > > I would like to instead propose that we change the behavior of Hive > to > > > read > > > > data of these types into a regular integer columns for now and try to > > > > remove any outstanding references to tinyint and smallint until we > can > > > > commit to fully supporting them. > > > > > > > > [1] http://drill.apache.org/docs/supported-data-types/ > > > > > > > > > >
