Re: [Discuss] Hive - Smallint and Tinyint

Jason Altekruse Mon, 08 Jun 2015 12:42:06 -0700

I am in support of opening JIRAs to enumerate the step necessary to fill in
the steps necessary to support these types. However I think it would be
good to get a fix into master for the functional bug that is in the code
today. That fix is easy and the only overhead is taking a little more space
for the data after it has been read into Drill.


As we are looking to keep up with our near-monthly release schedule, I'm
uncertain that we can have these types implemented and well tested by the
next release, but I think we very realistically could start testing Hive
more thoroughly after this small fix.

On Mon, Jun 8, 2015 at 12:29 PM, Jacques Nadeau <[email protected]> wrote:

> I think it would be worthwhile to first open up a set of JIRAs associated
> with finishing support for these datatypes.  I'm guessing the scale of
> effort is less than one might initially guess.  Once those are opened, it
> would be easier to give feedback on the relative merit of that work versus
> the alternative solution you suggested.
>
> On Mon, Jun 8, 2015 at 11:12 AM, Jason Altekruse <[email protected]
> >
> wrote:
>
> > Hello Drillers,
> >
> > I have been working on DRILL-3209, which aims to speed up reading from
> hive
> > tables by re-planning them as native Drill reads in the case where the
> > tables are backed by files that have available native readers. This will
> > begin with parquet and delimited text files.
> >
> > To provide the same behavior as reading through the Serde interface, I
> must
> > insert a cast above the read operation to provide the same types that the
> > Hive scan otherwise would.
> >
> > The issue I am seeing is that Hive appears to be reading into both the
> > tinyint and smallint types which I believe are not fully supported
> > (currently my new injected project is failing to find a function to cast
> to
> > tinyint). See the unsupported note in the docs here [1] for smallint,
> > tinyint is not even listed.
> >
> > I can simply add the function to provide the same type as we currently
> read
> > out of the scan, but I believe we will have other issues with trying to
> > support this right now as we have not thoroughly tested these other
> integer
> > types.
> >
> > I would like to instead propose that we change the behavior of Hive to
> read
> > data of these types into a regular integer columns for now and try to
> > remove any outstanding references to tinyint and smallint until we can
> > commit to fully supporting them.
> >
> > [1] http://drill.apache.org/docs/supported-data-types/
> >
>

Re: [Discuss] Hive - Smallint and Tinyint

Reply via email to