Hi all,
I' working on the explicit cast support in drill. So far, I have prototyped
the implementation for the first 3 categories, and would like to seek input
from you regarding how to deal with the buffer allocation for cast from
fixed-length type into var-length type.
1. cast from fixed-length type to fixed-length type
eg: float4 --> int,
int -> float4,
2. cast from var-length type to fixed-length type
eg: varchar --> int
varbinary --> int
(Still need to figure out how to handle overflow issue when cast)
3. cast from fixed-length type to var-length type
eg: int -> varchar
bigint -> varbinary
4. cast from var-length type to var-length type
eg: varchar --> varchar
varbinary --> varchar
For the 3rd one, ie. from fixed-length to var-length type, it causes some
problem to the current implementation, in terms of buffer allocation.
For the fixed-length type, drill uses java primitive type in ValueHolder.
For instance, IntHolder.value is a int. But for var-length type, drill
will use a buffer to keep its value. When doing cast from int into varchar,
the buffer for the VarCharHolder is not allocated, and we have to figure
out a way to do the allocation, before cast.
There seems 2 options:
Option 1: allocate buffer in the function template setup() method. The
buffer will be used in eval() method.
Problem with this option :
1) need copy twice. first copy from fixed-type input into the buffer
allocated in setup(), second copy from the buffer into the buffer in the
target vector.
2) need add a cleanup() method to function template, to clean the buffer
allocated, which currently is not there in the code base.
Option 2: the consumer of output of the cast function will be responsible
to pre-allocate buffer in the target ValueVector for all the
VarCharHolder(). The cast function will simply do the conversion and copy
into the pre-allocated buffer in the target ValueVector.
Good thing of this option is it requires 1 copy.
I have prototyped the 1st option, and have not figured out how to implement
the 2nd approach yet. But I would like to seek suggestion regarding those 2
options, before I proceed next.
Thanks!