I think it would be worthwhile to first open up a set of JIRAs associated
with finishing support for these datatypes.  I'm guessing the scale of
effort is less than one might initially guess.  Once those are opened, it
would be easier to give feedback on the relative merit of that work versus
the alternative solution you suggested.

On Mon, Jun 8, 2015 at 11:12 AM, Jason Altekruse <[email protected]>
wrote:

> Hello Drillers,
>
> I have been working on DRILL-3209, which aims to speed up reading from hive
> tables by re-planning them as native Drill reads in the case where the
> tables are backed by files that have available native readers. This will
> begin with parquet and delimited text files.
>
> To provide the same behavior as reading through the Serde interface, I must
> insert a cast above the read operation to provide the same types that the
> Hive scan otherwise would.
>
> The issue I am seeing is that Hive appears to be reading into both the
> tinyint and smallint types which I believe are not fully supported
> (currently my new injected project is failing to find a function to cast to
> tinyint). See the unsupported note in the docs here [1] for smallint,
> tinyint is not even listed.
>
> I can simply add the function to provide the same type as we currently read
> out of the scan, but I believe we will have other issues with trying to
> support this right now as we have not thoroughly tested these other integer
> types.
>
> I would like to instead propose that we change the behavior of Hive to read
> data of these types into a regular integer columns for now and try to
> remove any outstanding references to tinyint and smallint until we can
> commit to fully supporting them.
>
> [1] http://drill.apache.org/docs/supported-data-types/
>

Reply via email to