Re: [Discuss] Hive - Smallint and Tinyint

Jacques Nadeau Mon, 08 Jun 2015 13:19:56 -0700

Got it.  Should be fine, then.

On Mon, Jun 8, 2015 at 12:46 PM, Jason Altekruse <[email protected]>
wrote:


> I was going to be changing them on the schema side as well. As I am
> currently implementing the feature as a rewrite rule, I have to match the
> schema of the relational tree I am replacing. To make it work in execution
> I have to cast to an integer (or add the tinyint cast). If I choose the
> former, the planning will fail on mismatch types between the tinyint
> expected from the Hives can that differs from the integer coming out of the
> cast.
>
> On Mon, Jun 8, 2015 at 12:43 PM, Jacques Nadeau <[email protected]>
> wrote:
>
> > The only concern I have around changing the types in execution is that it
> > may cause strange behaviors.  Are you planning on changing them on the
> > schema side as well?  That way Calcite wouldn't insert weird expression
> > patterns that would cause other problems if you change the execution
> side.
> >
> > On Mon, Jun 8, 2015 at 12:41 PM, Jason Altekruse <
> [email protected]
> > >
> > wrote:
> >
> > > I am in support of opening JIRAs to enumerate the step necessary to
> fill
> > in
> > > the steps necessary to support these types. However I think it would be
> > > good to get a fix into master for the functional bug that is in the
> code
> > > today. That fix is easy and the only overhead is taking a little more
> > space
> > > for the data after it has been read into Drill.
> > >
> > > As we are looking to keep up with our near-monthly release schedule,
> I'm
> > > uncertain that we can have these types implemented and well tested by
> the
> > > next release, but I think we very realistically could start testing
> Hive
> > > more thoroughly after this small fix.
> > >
> > > On Mon, Jun 8, 2015 at 12:29 PM, Jacques Nadeau <[email protected]>
> > > wrote:
> > >
> > > > I think it would be worthwhile to first open up a set of JIRAs
> > associated
> > > > with finishing support for these datatypes.  I'm guessing the scale
> of
> > > > effort is less than one might initially guess.  Once those are
> opened,
> > it
> > > > would be easier to give feedback on the relative merit of that work
> > > versus
> > > > the alternative solution you suggested.
> > > >
> > > > On Mon, Jun 8, 2015 at 11:12 AM, Jason Altekruse <
> > > [email protected]
> > > > >
> > > > wrote:
> > > >
> > > > > Hello Drillers,
> > > > >
> > > > > I have been working on DRILL-3209, which aims to speed up reading
> > from
> > > > hive
> > > > > tables by re-planning them as native Drill reads in the case where
> > the
> > > > > tables are backed by files that have available native readers. This
> > > will
> > > > > begin with parquet and delimited text files.
> > > > >
> > > > > To provide the same behavior as reading through the Serde
> interface,
> > I
> > > > must
> > > > > insert a cast above the read operation to provide the same types
> that
> > > the
> > > > > Hive scan otherwise would.
> > > > >
> > > > > The issue I am seeing is that Hive appears to be reading into both
> > the
> > > > > tinyint and smallint types which I believe are not fully supported
> > > > > (currently my new injected project is failing to find a function to
> > > cast
> > > > to
> > > > > tinyint). See the unsupported note in the docs here [1] for
> smallint,
> > > > > tinyint is not even listed.
> > > > >
> > > > > I can simply add the function to provide the same type as we
> > currently
> > > > read
> > > > > out of the scan, but I believe we will have other issues with
> trying
> > to
> > > > > support this right now as we have not thoroughly tested these other
> > > > integer
> > > > > types.
> > > > >
> > > > > I would like to instead propose that we change the behavior of Hive
> > to
> > > > read
> > > > > data of these types into a regular integer columns for now and try
> to
> > > > > remove any outstanding references to tinyint and smallint until we
> > can
> > > > > commit to fully supporting them.
> > > > >
> > > > > [1] http://drill.apache.org/docs/supported-data-types/
> > > > >
> > > >
> > >
> >
>

Re: [Discuss] Hive - Smallint and Tinyint

Reply via email to