[ 
https://issues.apache.org/jira/browse/DRILL-6530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16530535#comment-16530535
 ] 

ASF GitHub Bot commented on DRILL-6530:
---------------------------------------

parthchandra closed pull request #1343: DRILL-6530: JVM crash with a query 
involving multiple json files with…
URL: https://github.com/apache/drill/pull/1343
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/exec/vector/src/main/codegen/templates/ListWriters.java 
b/exec/vector/src/main/codegen/templates/ListWriters.java
index cab8772a741..4300857b9eb 100644
--- a/exec/vector/src/main/codegen/templates/ListWriters.java
+++ b/exec/vector/src/main/codegen/templates/ListWriters.java
@@ -107,11 +107,13 @@ public void setValueCount(int count){
   public MapWriter map() {
     switch (mode) {
     case INIT:
-      int vectorCount = container.size();
+      final ValueVector oldVector = container.getChild(name);
       final RepeatedMapVector vector = container.addOrGet(name, 
RepeatedMapVector.TYPE, RepeatedMapVector.class);
       innerVector = vector;
       writer = new RepeatedMapWriter(vector, this);
-      if(vectorCount != container.size()) {
+      // oldVector will be null if it's first batch being created and it might 
not be same as newly added vector
+      // if new batch has schema change
+      if (oldVector == null || oldVector != vector) {
         writer.allocate();
       }
       writer.setPosition(${index});
@@ -131,11 +133,13 @@ public MapWriter map() {
   public ListWriter list() {
     switch (mode) {
     case INIT:
-      final int vectorCount = container.size();
+      final ValueVector oldVector = container.getChild(name);
       final RepeatedListVector vector = container.addOrGet(name, 
RepeatedListVector.TYPE, RepeatedListVector.class);
       innerVector = vector;
       writer = new RepeatedListWriter(null, vector, this);
-      if (vectorCount != container.size()) {
+      // oldVector will be null if it's first batch being created and it might 
not be same as newly added vector
+      // if new batch has schema change
+      if (oldVector == null || oldVector != vector) {
         writer.allocate();
       }
       writer.setPosition(${index});
@@ -176,11 +180,13 @@ public ListWriter list() {
   </#if>
     switch (mode) {
     case INIT:
-      final int vectorCount = container.size();
+      final ValueVector oldVector = container.getChild(name);
       final Repeated${capName}Vector vector = container.addOrGet(name, 
${upperName}_TYPE, Repeated${capName}Vector.class);
       innerVector = vector;
       writer = new Repeated${capName}WriterImpl(vector, this);
-      if(vectorCount != container.size()) {
+      // oldVector will be null if it's first batch being created and it might 
not be same as newly added vector
+      // if new batch has schema change
+      if (oldVector == null || oldVector != vector) {
         writer.allocate();
       }
       writer.setPosition(${index});


 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> JVM crash with a query involving multiple json files with one file having a 
> schema change of one column from string to list
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-6530
>                 URL: https://issues.apache.org/jira/browse/DRILL-6530
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Data Types
>    Affects Versions: 1.14.0
>            Reporter: Kedar Sankar Behera
>            Assignee: Sorabh Hamirwasia
>            Priority: Major
>              Labels: ready-to-commit
>             Fix For: 1.14.0
>
>         Attachments: 0_0_92.json, 0_0_93.json, drillbit.log, drillbit.out, 
> hs_err_pid32076.log
>
>
> JVM crash with a Lateral Unnest query involving multiple json files with one 
> file having a schema change of one column from string to list .
> Query :- 
> {code}
> SELECT customer.c_custkey,customer.c_acctbal,orders.o_orderkey, 
> orders.o_totalprice,orders.o_orderdate,orders.o_shippriority,customer.c_address,orders.o_orderpriority,customer.c_comment
> FROM customer, LATERAL 
> (SELECT O.ord.o_orderkey as o_orderkey, O.ord.o_totalprice as 
> o_totalprice,O.ord.o_orderdate as o_orderdate ,O.ord.o_shippriority as 
> o_shippriority,O.ord.o_orderpriority 
> as o_orderpriority FROM UNNEST(customer.c_orders) O(ord))orders;
> {code}
> The error got was 
> {code}
> o.a.d.e.p.impl.join.LateralJoinBatch - Output batch still has some space 
> left, getting new batches from left and right
> 2018-06-21 15:25:16,303 [24d3da36-bdb8-cb5b-594c-82135bfb84aa:frag:0:0] DEBUG 
> o.a.d.exec.physical.impl.ScanBatch - set record count 0 for vv c_custkey
> 2018-06-21 15:25:16,303 [24d3da36-bdb8-cb5b-594c-82135bfb84aa:frag:0:0] DEBUG 
> o.a.d.exec.physical.impl.ScanBatch - set record count 0 for vv c_phone
> 2018-06-21 15:25:16,303 [24d3da36-bdb8-cb5b-594c-82135bfb84aa:frag:0:0] DEBUG 
> o.a.d.exec.physical.impl.ScanBatch - set record count 0 for vv c_acctbal
> 2018-06-21 15:25:16,303 [24d3da36-bdb8-cb5b-594c-82135bfb84aa:frag:0:0] DEBUG 
> o.a.d.exec.physical.impl.ScanBatch - set record count 0 for vv c_orders
> 2018-06-21 15:25:16,303 [24d3da36-bdb8-cb5b-594c-82135bfb84aa:frag:0:0] DEBUG 
> o.a.d.exec.physical.impl.ScanBatch - set record count 0 for vv c_mktsegment
> 2018-06-21 15:25:16,303 [24d3da36-bdb8-cb5b-594c-82135bfb84aa:frag:0:0] DEBUG 
> o.a.d.exec.physical.impl.ScanBatch - set record count 0 for vv c_address
> 2018-06-21 15:25:16,303 [24d3da36-bdb8-cb5b-594c-82135bfb84aa:frag:0:0] DEBUG 
> o.a.d.exec.physical.impl.ScanBatch - set record count 0 for vv c_nationkey
> 2018-06-21 15:25:16,303 [24d3da36-bdb8-cb5b-594c-82135bfb84aa:frag:0:0] DEBUG 
> o.a.d.exec.physical.impl.ScanBatch - set record count 0 for vv c_name
> 2018-06-21 15:25:16,303 [24d3da36-bdb8-cb5b-594c-82135bfb84aa:frag:0:0] DEBUG 
> o.a.d.exec.physical.impl.ScanBatch - set record count 0 for vv c_comment
> 2018-06-21 15:25:16,316 [24d3da36-bdb8-cb5b-594c-82135bfb84aa:frag:0:0] DEBUG 
> o.a.d.e.v.c.AbstractContainerVector - Field [o_comment] mutated from 
> [NullableVarCharVector] to [RepeatedVarCharVector]
> 2018-06-21 15:25:16,318 [24d3da36-bdb8-cb5b-594c-82135bfb84aa:frag:0:0] DEBUG 
> o.a.drill.exec.vector.UInt4Vector - Reallocating vector [[`$offsets$` 
> (UINT4:REQUIRED)]]. # of bytes: [16384] -> [32768]
> {code}
> On Further investigating with [~shamirwasia] it's found that the crash only 
> happens when [o_comment] mutates from  [NullableVarCharVector]  to 
> [RepeatedVarCharVector],not the other way around
> Please find the logs stack trace and the data file
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to