Re: [Discuss] Hive - Smallint and Tinyint

Daniel Barclay Mon, 08 Jun 2015 13:53:32 -0700

Note DRILL-2470, "Implement SMALLINT and TINYINT [umbrella]".


Jacques Nadeau wrote:

I think it would be worthwhile to first open up a set of JIRAs associated
with finishing support for these datatypes.  I'm guessing the scale of
effort is less than one might initially guess.  Once those are opened, it
would be easier to give feedback on the relative merit of that work versus
the alternative solution you suggested.

On Mon, Jun 8, 2015 at 11:12 AM, Jason Altekruse <[email protected]>
wrote:

Hello Drillers,

I have been working on DRILL-3209, which aims to speed up reading from hive
tables by re-planning them as native Drill reads in the case where the
tables are backed by files that have available native readers. This will
begin with parquet and delimited text files.

To provide the same behavior as reading through the Serde interface, I must
insert a cast above the read operation to provide the same types that the
Hive scan otherwise would.

The issue I am seeing is that Hive appears to be reading into both the
tinyint and smallint types which I believe are not fully supported
(currently my new injected project is failing to find a function to cast to
tinyint). See the unsupported note in the docs here [1] for smallint,
tinyint is not even listed.

I can simply add the function to provide the same type as we currently read
out of the scan, but I believe we will have other issues with trying to
support this right now as we have not thoroughly tested these other integer
types.

I would like to instead propose that we change the behavior of Hive to read
data of these types into a regular integer columns for now and try to
remove any outstanding references to tinyint and smallint until we can
commit to fully supporting them.

[1] http://drill.apache.org/docs/supported-data-types/



--
Daniel Barclay
MapR Technologies

Re: [Discuss] Hive - Smallint and Tinyint

Reply via email to