[
https://issues.apache.org/jira/browse/TRAFODION-2465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840175#comment-15840175
]
David Wayne Birdsall commented on TRAFODION-2465:
-------------------------------------------------
I reproduced this again, this time capturing a ULOG. The ULOG shows that on the
first "update statistics persistent" we correctly create the sample table with
a LIMIT COLUMN LENGTH TO 256 clause. But the second time, this is absent. That
causes a 4247 error which is not reported by the UPDATE STATISTICS logic.
So, there are two issues here:
1. We are not reporting an error when we should be.
2. The LIMIT COLUMN LENGTH TO 256 clause is missing the second time.
> UPDATE STATS failure on Hive table with long varchars
> -----------------------------------------------------
>
> Key: TRAFODION-2465
> URL: https://issues.apache.org/jira/browse/TRAFODION-2465
> Project: Apache Trafodion
> Issue Type: Bug
> Components: sql-cmp
> Affects Versions: 2.1-incubating
> Environment: Likely all
> Reporter: David Wayne Birdsall
> Assignee: David Wayne Birdsall
>
> The scenario is the following:
> 1. Create a Hive text format table that has string columns with values
> 1048576 bytes long.
> 2. Create a Trafodion external table with varchar(1048576) columns for these
> string columns.
> 3. Do an UPDATE STATISTICS PERSISTENT on the table. This succeeds.
> 4. Do an UPDATE STATISTICS REMOVE SAMPLE on the table. This succeeds.
> 5. Repeat the UPDATE STATISTICS PERSISTENT on the table. This fails with the
> following errors:
> *** ERROR[9200] UPDATE STATISTICS for table HIVE.MYSCH.MYTABLE encountered an
> error (8609) from statement Process_Query.
> *** ERROR[8609] Waited rollback performed without starting a transaction.
> *** ERROR[9200] UPDATE STATISTICS for table HIVE.MYSCH.MYTABLE encountered an
> error (8609) from statement Process_Query.
> *** ERROR[8609] Waited rollback performed without starting a transaction.
> *** ERROR[9200] UPDATE STATISTICS for table HIVE.MYSCH.MYTABLE encountered an
> error (8609) from statement Process_Query.
> *** ERROR[8609] Waited rollback performed without starting a transaction.
> *** ERROR[9200] UPDATE STATISTICS for table HIVE.MYSCH.MYTABLE encountered an
> error (8609) from statement Process_Query.
> *** ERROR[8609] Waited rollback performed without starting a transaction.
> *** ERROR[9214] Object
> TRAFODION."_HIVESTATS_".TRAF_SAMPLE_93_1485453261_202793 could not be created.
> *** ERROR[8609] Waited rollback performed without starting a transaction.
> To reproduce the problem, create 3 files datagen.py, hive.sql, mytest.sql
> with the following contents and run them in this order:
> chmod 755 datagen.py
> ./datagen.py data.10rows_1MB.txt 10 2 1048576
> hive -f hive.sql
> sqlci -i mytest.sql
> $ cat datagen.py
> #! /usr/bin/env python
> import sys
> if len(sys.argv) != 5 or \
> sys.argv[1].lower() == '-h' or \
> sys.argv[1].lower() == '-help':
> print 'Usage: ' + sys.argv[0] + ' <file> <num of rows> <num of varchar
> columns> <varchar column length>'
> sys.exit()
> f = open(sys.argv[1], "w+")
> marker=list('ABCDEFGHIJKLMNOPQRSTUVWXYZ')
> for num_rows in range(0, int(sys.argv[2])):
> f.write(str(num_rows) + '|')
> for num_cols in range(0, int(sys.argv[3])):
> f.write(marker[num_rows%len(marker)])
> for i in range (1, int(sys.argv[4])):
> f.write(str(i % 10))
> f.write('|')
> f.write(str(num_rows))
> f.write('\n')
> f.close()
> $ cat hive.sql
> drop database if exists mysch cascade;
> create database mysch;
> use mysch;
> create table base_lgvarchar_table(c_int int, c_string1 string, c_string2
> string , p_int int) row format delimited fields terminated by '|';
> load data local inpath './data.10rows_10MB.txt' overwrite into table
> base_lgvarchar_table;
> create table mytable (c_int int, c_string1 string, c_string2 string ,p_int
> int) row format delimited fields terminated by '|';
> from base_lgvarchar_table insert overwrite table mytable select *;
> $ cat mytest.sql
> log mytest.log clear;
> control query default HIVE_MAX_STRING_LENGTH '1048576';
> set schema hive.mysch;
> drop external table if exists mytable cascade;
> create external table mytable (c_int int, c_string1 varchar(1048576),
> c_string2 varchar(1048576), p_int int) for hive.mysch.mytable;
> showddl mytable;
> update statistics for table mytable on every column sample random 90 percent
> persistent;
> update statistics for table mytable remove sample;
> update statistics for table mytable on existing column sample random 90
> percent persistent;
> update statistics for table mytable remove sample;
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)