[ 
https://issues.apache.org/jira/browse/DRILL-6530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16524488#comment-16524488
 ] 

Sorabh Hamirwasia commented on DRILL-6530:
------------------------------------------

The issue here was in ListWriters.java, when it adds a writer for repeated type 
then it gets the count of vectors currently in container and after calling 
*container.addOrGet* again get's the vector count in container to see if a new 
vector is added or not. If a new vector is added then it allocates memory for 
the newly created vector since by default the Drillbuf for a vector is empty. 

In cases when the schema only changes in a way such that number of fields are 
same but only type of a field has changed, the vector count before calling 
*container.addOrGet* and after calling that will be same even though a new 
vector for that field is internally created. Hence the space for Drillbuf was 
not getting allocated. Later when 
[startNewValue|https://github.com/apache/drill/blob/master/exec/vector/src/main/java/org/apache/drill/exec/vector/complex/BaseRepeatedValueVector.java#L270]
 was called as part of 
[setPosition|https://github.com/apache/drill/blob/master/exec/vector/src/main/codegen/templates/ListWriters.java#L117],
 it was doing realloc but only initializing second half of offset vector and 
hence leaving the first half uninitialized which ultimately resulted in 
referencing a memory at index represented by uninitialized garbage value.

The fix is to detect the creation of new value vector in case of schema changes 
by comparing old and new ValueVector

> JVM crash with a query involving multiple json files with one file having a 
> schema change of one column from string to list
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-6530
>                 URL: https://issues.apache.org/jira/browse/DRILL-6530
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Data Types
>    Affects Versions: 1.14.0
>            Reporter: Kedar Sankar Behera
>            Assignee: Sorabh Hamirwasia
>            Priority: Major
>             Fix For: 1.14.0
>
>         Attachments: 0_0_92.json, 0_0_93.json, drillbit.log, drillbit.out, 
> hs_err_pid32076.log
>
>
> JVM crash with a Lateral Unnest query involving multiple json files with one 
> file having a schema change of one column from string to list .
> Query :- 
> {code}
> SELECT customer.c_custkey,customer.c_acctbal,orders.o_orderkey, 
> orders.o_totalprice,orders.o_orderdate,orders.o_shippriority,customer.c_address,orders.o_orderpriority,customer.c_comment
> FROM customer, LATERAL 
> (SELECT O.ord.o_orderkey as o_orderkey, O.ord.o_totalprice as 
> o_totalprice,O.ord.o_orderdate as o_orderdate ,O.ord.o_shippriority as 
> o_shippriority,O.ord.o_orderpriority 
> as o_orderpriority FROM UNNEST(customer.c_orders) O(ord))orders;
> {code}
> The error got was 
> {code}
> o.a.d.e.p.impl.join.LateralJoinBatch - Output batch still has some space 
> left, getting new batches from left and right
> 2018-06-21 15:25:16,303 [24d3da36-bdb8-cb5b-594c-82135bfb84aa:frag:0:0] DEBUG 
> o.a.d.exec.physical.impl.ScanBatch - set record count 0 for vv c_custkey
> 2018-06-21 15:25:16,303 [24d3da36-bdb8-cb5b-594c-82135bfb84aa:frag:0:0] DEBUG 
> o.a.d.exec.physical.impl.ScanBatch - set record count 0 for vv c_phone
> 2018-06-21 15:25:16,303 [24d3da36-bdb8-cb5b-594c-82135bfb84aa:frag:0:0] DEBUG 
> o.a.d.exec.physical.impl.ScanBatch - set record count 0 for vv c_acctbal
> 2018-06-21 15:25:16,303 [24d3da36-bdb8-cb5b-594c-82135bfb84aa:frag:0:0] DEBUG 
> o.a.d.exec.physical.impl.ScanBatch - set record count 0 for vv c_orders
> 2018-06-21 15:25:16,303 [24d3da36-bdb8-cb5b-594c-82135bfb84aa:frag:0:0] DEBUG 
> o.a.d.exec.physical.impl.ScanBatch - set record count 0 for vv c_mktsegment
> 2018-06-21 15:25:16,303 [24d3da36-bdb8-cb5b-594c-82135bfb84aa:frag:0:0] DEBUG 
> o.a.d.exec.physical.impl.ScanBatch - set record count 0 for vv c_address
> 2018-06-21 15:25:16,303 [24d3da36-bdb8-cb5b-594c-82135bfb84aa:frag:0:0] DEBUG 
> o.a.d.exec.physical.impl.ScanBatch - set record count 0 for vv c_nationkey
> 2018-06-21 15:25:16,303 [24d3da36-bdb8-cb5b-594c-82135bfb84aa:frag:0:0] DEBUG 
> o.a.d.exec.physical.impl.ScanBatch - set record count 0 for vv c_name
> 2018-06-21 15:25:16,303 [24d3da36-bdb8-cb5b-594c-82135bfb84aa:frag:0:0] DEBUG 
> o.a.d.exec.physical.impl.ScanBatch - set record count 0 for vv c_comment
> 2018-06-21 15:25:16,316 [24d3da36-bdb8-cb5b-594c-82135bfb84aa:frag:0:0] DEBUG 
> o.a.d.e.v.c.AbstractContainerVector - Field [o_comment] mutated from 
> [NullableVarCharVector] to [RepeatedVarCharVector]
> 2018-06-21 15:25:16,318 [24d3da36-bdb8-cb5b-594c-82135bfb84aa:frag:0:0] DEBUG 
> o.a.drill.exec.vector.UInt4Vector - Reallocating vector [[`$offsets$` 
> (UINT4:REQUIRED)]]. # of bytes: [16384] -> [32768]
> {code}
> On Further investigating with [~shamirwasia] it's found that the crash only 
> happens when [o_comment] mutates from  [NullableVarCharVector]  to 
> [RepeatedVarCharVector],not the other way around
> Please find the logs stack trace and the data file
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to