Hi Jason, Good question.
Actually, for some type cast, it is *binary coercible, *means there is no need internally to do any conversion. for instance, char --> varchar, varchar --> varbinary, etc. For other cases, some transformation is required, since the binary representation of source type is different from the binary representation of target type. For instance, int -> varchar. The target type need keep each digit of the integer, while the source type is a 4-byte representation. I will look into whether it's possible to use the buffer in the output value vector directly, without copying into new buffer. On Tue, Dec 3, 2013 at 6:29 PM, Jason Altekruse <[email protected]>wrote: > Hi Jinfeng, > > This might be a dumb question, but is there any transformation being > performed when going from a fixed length type to a variable length type? > That is, are the bytes in the buffer coming in going to be the same as the > bytes coming out of the cast? > > I understand that for casts like int-> long we need to add extra space > between each value, but is it possible that we could just hand the buffer > from one value vector type to the other without copying it into a new > buffer? > > We would still have to create a new buffer with the offsets of the > "variable length" values, but it would save us some time if we could do > this. > > -Jason Altekruse > > > On Tue, Dec 3, 2013 at 5:35 PM, Jinfeng Ni <[email protected]> wrote: > > > Hi all, > > > > I' working on the explicit cast support in drill. So far, I have > prototyped > > the implementation for the first 3 categories, and would like to seek > input > > from you regarding how to deal with the buffer allocation for cast from > > fixed-length type into var-length type. > > > > 1. cast from fixed-length type to fixed-length type > > eg: float4 --> int, > > int -> float4, > > > > 2. cast from var-length type to fixed-length type > > eg: varchar --> int > > varbinary --> int > > (Still need to figure out how to handle overflow issue when cast) > > > > 3. cast from fixed-length type to var-length type > > eg: int -> varchar > > bigint -> varbinary > > > > 4. cast from var-length type to var-length type > > eg: varchar --> varchar > > varbinary --> varchar > > > > For the 3rd one, ie. from fixed-length to var-length type, it causes some > > problem to the current implementation, in terms of buffer allocation. > > > > For the fixed-length type, drill uses java primitive type in ValueHolder. > > For instance, IntHolder.value is a int. But for var-length type, drill > > will use a buffer to keep its value. When doing cast from int into > varchar, > > the buffer for the VarCharHolder is not allocated, and we have to figure > > out a way to do the allocation, before cast. > > > > There seems 2 options: > > Option 1: allocate buffer in the function template setup() method. The > > buffer will be used in eval() method. > > Problem with this option : > > 1) need copy twice. first copy from fixed-type input into the buffer > > allocated in setup(), second copy from the buffer into the buffer in the > > target vector. > > 2) need add a cleanup() method to function template, to clean the buffer > > allocated, which currently is not there in the code base. > > > > Option 2: the consumer of output of the cast function will be > responsible > > to pre-allocate buffer in the target ValueVector for all the > > VarCharHolder(). The cast function will simply do the conversion and > copy > > into the pre-allocated buffer in the target ValueVector. > > Good thing of this option is it requires 1 copy. > > > > I have prototyped the 1st option, and have not figured out how to > implement > > the 2nd approach yet. But I would like to seek suggestion regarding > those 2 > > options, before I proceed next. > > > > Thanks! > > >
