[ 
https://issues.apache.org/jira/browse/DRILL-5598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers reassigned DRILL-5598:
----------------------------------

         Assignee:     (was: Paul Rogers)
    Fix Version/s:     (was: 1.11.0)

> AllocationHelper.allocateNew ignores maps, arrays
> -------------------------------------------------
>
>                 Key: DRILL-5598
>                 URL: https://issues.apache.org/jira/browse/DRILL-5598
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.10.0
>            Reporter: Paul Rogers
>
> The method {{VectorAccessibleUtilities.allocateVectors()}} is used to 
> allocate vectors when the external sort creates a spill batch. (Along with 
> various other places.)
> This method does not allocate space for repeated vectors or vectors contained 
> in maps, resulting in vectors starting life with a very short size. This 
> cases repeated doublings as data is loaded into the vectors:
> {code}
> BigIntVector - Reallocating vector [$data$(BIGINT:REQUIRED)]. # of bytes: 
> [32768] -> [65536]
> UInt4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes: 
> [16384] -> [32768]
> UInt4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes: 
> [16384] -> [32768]
> UInt1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: 
> [4096] -> [8192]
> ...
> {code}
> Maps can be handled by iterating over the contained vectors. Arrays and 
> VarChars are harder as the code needs some hint about data size. We have 
> hard-coded hints available (the assumption that VarChar columns are 50 
> characters wide, and that arrays have 10 elements.) Better would be to pass 
> in metadata about sizes extracted from previously-seen batches in the same 
> operator that allocates a new batch.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to