Suresh Subbiah created TRAFODION-2138:
-----------------------------------------
Summary: Hive scan on wide tables can result in lost rows or error
Key: TRAFODION-2138
URL: https://issues.apache.org/jira/browse/TRAFODION-2138
Project: Apache Trafodion
Issue Type: Bug
Components: sql-exe
Affects Versions: 2.0-incubating
Reporter: Suresh Subbiah
Assignee: Suresh Subbiah
Fix For: 2.0-incubating
When a Hive table has wide rows (say greater than 16KB), and the scan is
parallelized. It is possible that one one these two problem situations may arise
a) An error with sqlcode 8446 is raised
*** ERROR[8446] An error occurred during hdfs buffer fetch. Error Detail: No
record delimiter found in buffer from hdfsRead.
b) some rows are lost.
Both these situations occur since the scan is broken up into ranges and the
lookahead to the next range is determined by HDFS_IO_RANGE_TAIL which defaults
to 16KB. This setting is also hardcoded in a crucial location so changing the
default does not help.
Test case to see this problem is
cqd traf_max_character_col_length '1000000';
drop table if exists t031t1;
create table t031t1 (z int not null primary key, a varchar(1000000), b
char(1000000));
insert into t031t1 values (1, repeat('a', 1000000, 1000000) , 'def');
insert into t031t1 values (2, repeat('a', 1000, 1000) , 'zzz');
insert into t031t1 values (3, repeat('a', 10000, 10000) , 'zzz');
insert into t031t1 values (4, repeat('a', 100000, 100000) , 'zzz');
insert into t031t1 values (5, repeat('a', 100, 100) , 'zzz');
insert into t031t1 values (6, repeat('a', 10, 10) , 'zzz');
insert into t031t1 values (7, repeat('a', 1, 1) , 'zzz');
insert into t031t1 values (8, repeat('a', 1000000, 1000000) , null);
insert into t031t1 values (9, repeat('a', 500000, 500000) , null);
insert into t031t1 values (10, repeat('a', 100, 100) , null);
--in hive
create table t031hive(z int, a string, b string);
-- in sqlci
cqd hive_max_string_length '1000000';
insert into hive.hive.t031hive select * from t031t1;
create external table t031hive (z int, a varchar(1000000), b varchar(1000000))
for hive.hive.t031hive;
control query shape sort_groupby(exchange(cut));
-- 2 esp plan
cqd hive_min_bytes_per_esp_partition '2000000' ;
prepare s2 from select count(*) from hive.hive.t031hive;
explain options 'f' s2 ;
execute s2 ;
cqd hive_min_bytes_per_esp_partition '1000000' ;
-- 4 esp plan
prepare s2 from select count(*) from hive.hive.t031hive;
explain options 'f' s2 ;
execute s2 ;
The row boundaries and esp scan rage boundaries are shown below
hexdump -v -e '7/1 "%5_ad:%-5_c" "\n"' t031.txt | grep "\n"
-- 4 esp plan
1000006:\n esp0 end (652820)
1001013:\n
1011020:\n
1111027:\n
1111134:\n
1111151:\n
1111159:\n
2111165:\n esp1 end (1305640) esp2 end (1958460)
2611171:\n
2611278:\n
-- 2 esp plan
1000006:\n
1001013:\n
1011020:\n
1111027:\n
1111134:\n
1111151:\n
1111159:\n
2111165:\n esp0 end (1305640)
2611171:\n
2611278:\n
umber of esps to scan: 4
Esp# Range# StartOffset BytesRead FileName
====== ====== ============ ============ ==============================
0 0 0 652820 t031hive/T031HIVE-0-20160716231120-1
(split_e)
1 1 652820 652820 t031hive/T031HIVE-0-20160716231120-1
(split_b/e)
2 2 1305640 652820 t031hive/T031HIVE-0-20160716231120-1
(split_b/e)
3 3 1958460 652819 t031hive/T031HIVE-0-20160716231120-1
(split_b)
Summary of bytes read per ESP (2611279 = 100 percent):
ESP 0 reads 652820 bytes ( 100 percent of avg)
ESP 1 reads 652820 bytes ( 100 percent of avg)
ESP 2 reads 652820 bytes ( 100 percent of avg)
ESP 3 reads 652819 bytes ( 99 percent of avg)
hdfsRootDir: hdfs://localhost:15600/user/hive/warehouse/t031hive
modTSforDir_ = 1468710680, numOfPartCols_ = 0
Number of ranges to scan: 2
Number of esps to scan: 2
Esp# Range# StartOffset BytesRead FileName
====== ====== ============ ============ ==============================
0 0 0 1305640 t031hive/T031HIVE-0-20160716231120-1
(split_e)
1 1 1305640 1305639 t031hive/T031HIVE-0-20160716231120-1
(split_b)
Summary of bytes read per ESP (2611279 = 100 percent):
ESP 0 reads 1305640 bytes ( 100 percent of avg)
ESP 1 reads 1305639 bytes ( 99 percent of avg)
hdfsRootDir: hdfs://localhost:15600/user/hive/warehouse/t031hive
modTSforDir_ = 1468710680, numOfPartCols_ = 0
The fix has three parts
a) RangeTailIOSize will now default to maximum length of a row, if > 16KB.
Each esp range will look ahead by this size. Previously this was hard coded to
16KB.
b) If a whole buffer does not have the start of a row, that is not an
error.
c) We raise an error during compile if maximum row size is greater than
size of IO buffer. We need to have the whole row (from start to finish) in a
contiguous buffer to do the rest of our logic. Currently IO buffer defaults to
64 MB.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)