David Wayne Birdsall created TRAFODION-2465:
-----------------------------------------------
Summary: UPDATE STATS failure on Hive table with long varchars
Key: TRAFODION-2465
URL: https://issues.apache.org/jira/browse/TRAFODION-2465
Project: Apache Trafodion
Issue Type: Bug
Components: sql-cmp
Affects Versions: 2.1-incubating
Environment: Likely all
Reporter: David Wayne Birdsall
Assignee: David Wayne Birdsall
The scenario is the following:
1. Create a Hive text format table that has string columns with values 1048576
bytes long.
2. Create a Trafodion external table with varchar(1048576) columns for these
string columns.
3. Do an UPDATE STATISTICS PERSISTENT on the table. This succeeds.
4. Do an UPDATE STATISTICS REMOVE SAMPLE on the table. This succeeds.
5. Repeat the UPDATE STATISTICS PERSISTENT on the table. This fails with the
following errors:
*** ERROR[9200] UPDATE STATISTICS for table HIVE.MYSCH.MYTABLE encountered an
error (8609) from statement Process_Query.
*** ERROR[8609] Waited rollback performed without starting a transaction.
*** ERROR[9200] UPDATE STATISTICS for table HIVE.MYSCH.MYTABLE encountered an
error (8609) from statement Process_Query.
*** ERROR[8609] Waited rollback performed without starting a transaction.
*** ERROR[9200] UPDATE STATISTICS for table HIVE.MYSCH.MYTABLE encountered an
error (8609) from statement Process_Query.
*** ERROR[8609] Waited rollback performed without starting a transaction.
*** ERROR[9200] UPDATE STATISTICS for table HIVE.MYSCH.MYTABLE encountered an
error (8609) from statement Process_Query.
*** ERROR[8609] Waited rollback performed without starting a transaction.
*** ERROR[9214] Object TRAFODION."_HIVESTATS_".TRAF_SAMPLE_93_1485453261_202793
could not be created.
*** ERROR[8609] Waited rollback performed without starting a transaction.
To reproduce the problem, create 3 files datagen.py, hive.sql, mytest.sql with
the following contents and run them in this order:
chmod 755 datagen.py
./datagen.py data.10rows_1MB.txt 10 2 1048576
hive -f hive.sql
sqlci -i mytest.sql
$ cat datagen.py
#! /usr/bin/env python
import sys
if len(sys.argv) != 5 or \
sys.argv[1].lower() == '-h' or \
sys.argv[1].lower() == '-help':
print 'Usage: ' + sys.argv[0] + ' <file> <num of rows> <num of varchar
columns> <varchar column length>'
sys.exit()
f = open(sys.argv[1], "w+")
marker=list('ABCDEFGHIJKLMNOPQRSTUVWXYZ')
for num_rows in range(0, int(sys.argv[2])):
f.write(str(num_rows) + '|')
for num_cols in range(0, int(sys.argv[3])):
f.write(marker[num_rows%len(marker)])
for i in range (1, int(sys.argv[4])):
f.write(str(i % 10))
f.write('|')
f.write(str(num_rows))
f.write('\n')
f.close()
$ cat hive.sql
drop database if exists mysch cascade;
create database mysch;
use mysch;
create table base_lgvarchar_table(c_int int, c_string1 string, c_string2 string
, p_int int) row format delimited fields terminated by '|';
load data local inpath './data.10rows_10MB.txt' overwrite into table
base_lgvarchar_table;
create table mytable (c_int int, c_string1 string, c_string2 string ,p_int int)
row format delimited fields terminated by '|';
from base_lgvarchar_table insert overwrite table mytable select *;
$ cat mytest.sql
log mytest.log clear;
control query default HIVE_MAX_STRING_LENGTH '1048576';
set schema hive.mysch;
drop external table if exists mytable cascade;
create external table mytable (c_int int, c_string1 varchar(1048576), c_string2
varchar(1048576), p_int int) for hive.mysch.mytable;
showddl mytable;
update statistics for table mytable on every column sample random 90 percent
persistent;
update statistics for table mytable remove sample;
update statistics for table mytable on existing column sample random 90 percent
persistent;
update statistics for table mytable remove sample;
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)