David Wayne Birdsall created TRAFODION-3282:
-----------------------------------------------

             Summary: Buffer overrun in ExHdfsScan::work in certain conditions
                 Key: TRAFODION-3282
                 URL: https://issues.apache.org/jira/browse/TRAFODION-3282
             Project: Apache Trafodion
          Issue Type: Bug
          Components: sql-exe
    Affects Versions: 2.4
            Reporter: David Wayne Birdsall
            Assignee: David Wayne Birdsall


If we have a large enough Hive text table with string columns, and the string 
columns have values that are longer than CQD HIVE_MAX_STRING_LENGTH_IN_BYTES, 
and there is no external table definition with longer column sizes given, we 
may core in ExHdfsScan::work with a buffer overrun.

The following test case reproduces the behavior.

First, use the following python script, called datagen.py:
{quote}#! /usr/bin/env python
import sys

if len(sys.argv) != 5 or \
 sys.argv[1].lower() == '-h' or \
 sys.argv[1].lower() == '-help':
 print 'Usage: ' + sys.argv[0] + ' <file> <num of rows> <num of varchar colum
ns> <varchar column length>'
 sys.exit()

f = open(sys.argv[1], "w+")

marker=list('ABCDEFGHIJKLMNOPQRSTUVWXYZ')
for num_rows in range(0, int(sys.argv[2])):
 f.write(str(num_rows) + '|')
 for num_cols in range(0, int(sys.argv[3])):
 f.write(marker[num_rows%len(marker)])
 for i in range (1, int(sys.argv[4])):
 f.write(str(i % 10))
 f.write('|')
 f.write(str(num_rows))
 f.write('\n')

f.close()
{quote}
Run this script as follows:
{quote}chmod 755 ./datagen.py
./datagen.py ./data_lgvc.10rows_512KB.txt 10 2 524288
{quote}
Next, perform the following commands in a Hive shell:
{quote}drop table if exists lgvc_base_table;

create table lgvc_base_table(c_int int, c_string1 string, c_string2 string, p_in
t int) row format delimited fields terminated by '|';
load data local inpath './data_lgvc.10rows_512KB.txt' overwrite into table lgvc_
base_table;
{quote}
Finally, do the following in sqlci:
{quote}CQD HDFS_IO_BUFFERSIZE '2048';

prepare s1 from select * from hive.hive.lgvc_base_table where c_int > 10;

execute s1;
{quote}
(The point of the CQD is to reduce the default HDFS read buffer size to 2Mb 
rather than its default of 65Mb, so the test will fail with a smaller input 
file.)

When this test case is run, we get a core with the following stack trace:
{quote}(gdb) bt
#0 0x00007ffff5116495 in raise () from /lib64/libc.so.6
#1 0x00007ffff5117c75 in abort () from /lib64/libc.so.6
#2 0x00007ffff6f02935 in ?? ()
 from /usr/lib/jvm/java-1.7.0-openjdk.x86_64/jre/lib/amd64/server/libjvm.so
#3 0x00007ffff707bfdf in ?? ()
 from /usr/lib/jvm/java-1.7.0-openjdk.x86_64/jre/lib/amd64/server/libjvm.so
#4 0x00007ffff6f077c2 in JVM_handle_linux_signal ()
 from /usr/lib/jvm/java-1.7.0-openjdk.x86_64/jre/lib/amd64/server/libjvm.so
#5 <signal handler called>
#6 0x00007ffff516d753 in memcpy () from /lib64/libc.so.6
#7 0x00007ffff35b4dd5 in ExHdfsScanTcb::work (this=0x7ffff7e99148)
 at ../executor/ExHdfsScan.cpp:601
#8 0x00007ffff333d7a1 in ex_tcb::sWork (tcb=0x7ffff7e99148)
 at ../executor/ex_tcb.h:102
#9 0x00007ffff350dba7 in ExSubtask::work (this=0x7ffff7e99ad0)
 at ../executor/ExScheduler.cpp:757
#10 0x00007ffff350cbf1 in ExScheduler::work (this=0x7ffff7e98cb0, prevWaitTime=
 0) at ../executor/ExScheduler.cpp:280
#11 0x00007ffff33a41c7 in ex_root_tcb::execute (this=0x7ffff7e99b78, 
 cliGlobals=0xba5970, glob=0x7ffff7ea5d40, input_desc=0x7ffff7ee1178, 
 diagsArea=@0x7ffffffee020, reExecute=0) at ../executor/ex_root.cpp:928
#12 0x00007ffff4e4c452 in Statement::execute (this=0x7ffff7e84f40, cliGlobals=
 0xba5970, input_desc=0x7ffff7ee1178, diagsArea=..., execute_state=
---Type <return> to continue, or q <return> to quit---q
Statement:Quit
(gdb)
{quote}
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to