Re: [Discuss] Hive - Smallint and Tinyint

Jacques Nadeau Mon, 08 Jun 2015 12:44:14 -0700

The only concern I have around changing the types in execution is that it
may cause strange behaviors.  Are you planning on changing them on the
schema side as well?  That way Calcite wouldn't insert weird expression
patterns that would cause other problems if you change the execution side.


On Mon, Jun 8, 2015 at 12:41 PM, Jason Altekruse <[email protected]>
wrote:

> I am in support of opening JIRAs to enumerate the step necessary to fill in
> the steps necessary to support these types. However I think it would be
> good to get a fix into master for the functional bug that is in the code
> today. That fix is easy and the only overhead is taking a little more space
> for the data after it has been read into Drill.
>
> As we are looking to keep up with our near-monthly release schedule, I'm
> uncertain that we can have these types implemented and well tested by the
> next release, but I think we very realistically could start testing Hive
> more thoroughly after this small fix.
>
> On Mon, Jun 8, 2015 at 12:29 PM, Jacques Nadeau <[email protected]>
> wrote:
>
> > I think it would be worthwhile to first open up a set of JIRAs associated
> > with finishing support for these datatypes.  I'm guessing the scale of
> > effort is less than one might initially guess.  Once those are opened, it
> > would be easier to give feedback on the relative merit of that work
> versus
> > the alternative solution you suggested.
> >
> > On Mon, Jun 8, 2015 at 11:12 AM, Jason Altekruse <
> [email protected]
> > >
> > wrote:
> >
> > > Hello Drillers,
> > >
> > > I have been working on DRILL-3209, which aims to speed up reading from
> > hive
> > > tables by re-planning them as native Drill reads in the case where the
> > > tables are backed by files that have available native readers. This
> will
> > > begin with parquet and delimited text files.
> > >
> > > To provide the same behavior as reading through the Serde interface, I
> > must
> > > insert a cast above the read operation to provide the same types that
> the
> > > Hive scan otherwise would.
> > >
> > > The issue I am seeing is that Hive appears to be reading into both the
> > > tinyint and smallint types which I believe are not fully supported
> > > (currently my new injected project is failing to find a function to
> cast
> > to
> > > tinyint). See the unsupported note in the docs here [1] for smallint,
> > > tinyint is not even listed.
> > >
> > > I can simply add the function to provide the same type as we currently
> > read
> > > out of the scan, but I believe we will have other issues with trying to
> > > support this right now as we have not thoroughly tested these other
> > integer
> > > types.
> > >
> > > I would like to instead propose that we change the behavior of Hive to
> > read
> > > data of these types into a regular integer columns for now and try to
> > > remove any outstanding references to tinyint and smallint until we can
> > > commit to fully supporting them.
> > >
> > > [1] http://drill.apache.org/docs/supported-data-types/
> > >
> >
>

Re: [Discuss] Hive - Smallint and Tinyint

Reply via email to