Ming LI created HAWQ-1094:
-----------------------------
Summary: Select on INTERNAL table returns wrong results when hdfs
blocks have checksum errors
Key: HAWQ-1094
URL: https://issues.apache.org/jira/browse/HAWQ-1094
Project: Apache HAWQ
Issue Type: Bug
Components: Fault Tolerance
Reporter: Ming LI
Assignee: Lei Chang
I created a parquet table and inserted the following values into the table:
{code}
sr37228_repro=# select * from number;
id
----
1
1
1
1
1
(5 rows)
{code}
I then modified the data in two of the three blocks and tried reading the data
again.
{code}
Modifying contents of internal table blocks...
Found hdfs://hdm1.hdp.local:8020/hawq_default/16385/16543/17000/10 in hdfs
Modifying block
/hadoop/hdfs/data/current/BP-2023073008-172.28.21.63-1462922052672/current/finalized/subdir0/subdir0/blk_1073742008
on 172.28.21.155
block_script.sh
100% 228 0.2KB/s 00:00
Modifying block
/hadoop/hdfs/data/current/BP-2023073008-172.28.21.63-1462922052672/current/finalized/subdir0/subdir0/blk_1073742008
on 172.28.21.156
block_script.sh
100% 228 0.2KB/s 00:00
Running count query again, this time with bad data in two of the three blocks
count | id
-------+----------
1 | 0
2 | 1
1 | 16777216
1 | 16777217
(4 rows)
Checking Showing file health:
Checking hdfs://hdm1.hdp.local:8020/hawq_default/16385/16543/17000/10 health
Connecting to namenode via
http://hdm1.hdp.local:50070/fsck?ugi=gpadmin&blocks=1&locations=1&files=1&path=%2Fhawq_default%2F16385%2F16543%2F17000%2F10
FSCK started by gpadmin (auth:SIMPLE) from /172.28.21.157 for path
/hawq_default/16385/16543/17000/10 at Mon Sep 26 12:07:53 PDT 2016
/hawq_default/16385/16543/17000/10 206 bytes, 1 block(s): OK
0. BP-2023073008-172.28.21.63-1462922052672:blk_1073742008_1186 len=206 repl=3
[DatanodeInfoWithStorage[172.28.21.155:50010,DS-1a18c785-48e5-4ab8-9228-b3f6857b952a,DISK],
DatanodeInfoWithStorage[172.28.19.211:50010,DS-6bf49ae7-6745-448b-803d-d12d93acad1d,DISK],
DatanodeInfoWithStorage[172.28.21.156:50010,DS-d22b0f7f-7065-42c4-bb66-ea361ec5e56a,DISK]]
Status: HEALTHY
Total size: 206 B
Total dirs: 0
Total files: 1
Total symlinks: 0
Total blocks (validated): 1 (avg. block size 206 B)
Minimally replicated blocks: 1 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 3.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 3
Number of racks: 1
FSCK ended at Mon Sep 26 12:07:53 PDT 2016 in 0 milliseconds
{code}
When setupBlockReader reads a bad block using the LocalBlockReader, the reader
correctly detects a bad checksum.
{code}
2016-09-26 13:02:09.267021
PDT,,,p380682,th795609216,,,,0,,,seg-10000,,,,,"LOG","00000","Resource manager
discovered local host IPv4 address 127.0.0.1",,,,,,,0,,"network_utils.c",210,
2016-09-26 13:02:09.267171
PDT,,,p380682,th795609216,,,,0,,,seg-10000,,,,,"LOG","00000","Resource manager
discovered local host IPv4 address
172.28.21.155",,,,,,,0,,"network_utils.c",210,
2016-09-26 13:02:16.239048
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
12:32:31 PDT,6227,con143,cmd72,seg1,,,x6227,sx1,"DEBUG1","00000","Dropping in
memory mapping OidInMemHeapMapping",,,,,,"SET log_min_messages TO
'debug5'",0,,"cdbinmemheapam.c",293,
2016-09-26 13:02:16.239289
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
12:32:31
PDT,6227,con143,cmd72,seg1,,,x6227,sx1,"DEBUG3","00000","CommitTransactionCommand",,,,,,"SET
log_min_messages TO 'debug5'",0,,"postgres.c",3131,
2016-09-26 13:02:16.239435
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
12:32:31
PDT,6227,con143,cmd72,seg1,,,x6227,sx1,"DEBUG3","00000","CommitTransaction",,,,,,"SET
log_min_messages TO 'debug5'",0,,"xact.c",5103,
2016-09-26 13:02:16.239819
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
12:32:31 PDT,6227,con143,cmd72,seg1,,,x6227,sx1,"DEBUG3","00000","name:
unnamed; blockState: STARTED; state: INPROGR, xid/subid/cid: 6227/1/0,
nestlvl: 1, children: <>",,,,,,"SET log_min_messages TO
'debug5'",0,,"xact.c",5128,
2016-09-26 13:02:16.239978
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
12:32:31 PDT,6227,con143,cmd72,seg1,,,x6227,sx1,"DEBUG1","00000","Dropping in
memory mapping OidInMemOnlyMapping",,,,,,"SET log_min_messages TO
'debug5'",0,,"cdbinmemheapam.c",293,
2016-09-26 13:02:25.600367
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
12:32:31 PDT,0,con143,,seg1,,,,,"DEBUG5","00000","First char: 'M'; gp_role =
'execute'.",,,,,,,0,,"postgres.c",4737,
2016-09-26 13:02:25.600639
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
12:32:31 PDT,0,con143,cmd74,seg1,,,,,"DEBUG1","00000","Message type M received
by from libpq, len = 1412",,,,,,,0,,"postgres.c",4813,
2016-09-26 13:02:25.600742
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
12:32:31 PDT,0,con143,cmd74,seg1,,,,,"DEBUG5","00000","MPP dispatched stmt
from QD: explain analyze select * from number;.",,,,,,,0,,"postgres.c",4893,
2016-09-26 13:02:25.600847
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
12:32:31 PDT,0,con143,cmd74,seg1,,,,,"DEBUG1","00000","SetupProcessIdentity:
receive msg:
ProcessIdentity_Begin_slice_1_idx_0_gang_1_cmd_74_writer_t_End_ProcessIdentity",,,,,,,0,,"identity.c",365,
2016-09-26 13:02:25.600997
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
12:32:31 PDT,0,con143,cmd74,seg1,,,,,"DEBUG1","00000","ProcessIdentity is not
init",,,,,,,0,,"identity.c",599,
2016-09-26 13:02:25.601129
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
12:32:31 PDT,0,con143,cmd74,seg1,,,,,"DEBUG1","00000","ProcessIdentity: slice
1 id 0 gang num 1 writer t",,,,,,,0,,"identity.c",602,
2016-09-26 13:02:25.601250
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
12:32:31 PDT,0,con143,cmd74,seg0,slice1,,,,"DEBUG5","00000","Get a temporary
directory:/tmp/hawq/segment",,,,,,,0,,"cdbtmpdir.c",48,
2016-09-26 13:02:25.601351
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
12:32:31
PDT,0,con143,cmd74,seg0,slice1,,,,"DEBUG1","00000","getLocalTmpDirFromSegmentConfig
session_id:143 command_id:74 qeidx:0
tmpdir:/tmp/hawq/segment",,,,,,,0,,"identity.c",418,
2016-09-26 13:02:25.601784
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
12:32:31
PDT,0,con143,cmd74,seg0,slice1,,,,"DEBUG3","00000","StartTransactionCommand",,,,,,"explain
analyze select * from number;",0,,"postgres.c",3107,
2016-09-26 13:02:25.602075
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
12:32:31
PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG3","00000","StartTransaction",,,,,,"explain
analyze select * from number;",0,,"xact.c",5103,
2016-09-26 13:02:25.602195
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG3","00000","name:
unnamed; blockState: DEFAULT; state: INPROGR, xid/subid/cid: 6228/1/0,
nestlvl: 1, children: <>",,,,,,"explain analyze select * from
number;",0,,"xact.c",5128,
2016-09-26 13:02:25.602578
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add
index 0 key 17000 relation pg_attribute",,,,,,"explain analyze select * from
number;",0,,"cdbinmemheapam.c",624,
2016-09-26 13:02:25.602703
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add
index 1 key 17000 relation pg_attribute",,,,,,"explain analyze select * from
number;",0,,"cdbinmemheapam.c",624,
2016-09-26 13:02:25.602836
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add
index 2 key 17000 relation pg_attribute",,,,,,"explain analyze select * from
number;",0,,"cdbinmemheapam.c",624,
2016-09-26 13:02:25.602994
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add
index 3 key 17000 relation pg_attribute",,,,,,"explain analyze select * from
number;",0,,"cdbinmemheapam.c",624,
2016-09-26 13:02:25.603104
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add
index 4 key 17000 relation pg_attribute",,,,,,"explain analyze select * from
number;",0,,"cdbinmemheapam.c",624,
2016-09-26 13:02:25.603211
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add
index 5 key 17000 relation pg_attribute",,,,,,"explain analyze select * from
number;",0,,"cdbinmemheapam.c",624,
2016-09-26 13:02:25.603317
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add
index 6 key 17000 relation pg_attribute",,,,,,"explain analyze select * from
number;",0,,"cdbinmemheapam.c",624,
2016-09-26 13:02:25.603572
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add
index 7 key 17000 relation pg_attribute",,,,,,"explain analyze select * from
number;",0,,"cdbinmemheapam.c",624,
2016-09-26 13:02:25.603751
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add
index 8 key 17002 relation pg_attribute",,,,,,"explain analyze select * from
number;",0,,"cdbinmemheapam.c",624,
2016-09-26 13:02:25.603881
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add
index 9 key 17002 relation pg_attribute",,,,,,"explain analyze select * from
number;",0,,"cdbinmemheapam.c",624,
2016-09-26 13:02:25.604003
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add
index 10 key 17002 relation pg_attribute",,,,,,"explain analyze select * from
number;",0,,"cdbinmemheapam.c",624,
2016-09-26 13:02:25.604110
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add
index 11 key 17002 relation pg_attribute",,,,,,"explain analyze select * from
number;",0,,"cdbinmemheapam.c",624,
2016-09-26 13:02:25.604216
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add
index 12 key 17002 relation pg_attribute",,,,,,"explain analyze select * from
number;",0,,"cdbinmemheapam.c",624,
2016-09-26 13:02:25.604323
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add
index 13 key 17002 relation pg_attribute",,,,,,"explain analyze select * from
number;",0,,"cdbinmemheapam.c",624,
2016-09-26 13:02:25.604555
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add
index 14 key 17002 relation pg_attribute",,,,,,"explain analyze select * from
number;",0,,"cdbinmemheapam.c",624,
2016-09-26 13:02:25.604697
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add
index 15 key 17002 relation pg_attribute",,,,,,"explain analyze select * from
number;",0,,"cdbinmemheapam.c",624,
2016-09-26 13:02:25.604848
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add
index 16 key 17002 relation pg_attribute",,,,,,"explain analyze select * from
number;",0,,"cdbinmemheapam.c",624,
2016-09-26 13:02:25.604959
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add
index 17 key 17002 relation pg_attribute",,,,,,"explain analyze select * from
number;",0,,"cdbinmemheapam.c",624,
2016-09-26 13:02:25.605064
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","add
index 18 key 17002 relation pg_attribute",,,,,,"explain analyze select * from
number;",0,,"cdbinmemheapam.c",624,
2016-09-26 13:02:25.605591
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
12:32:31
PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG3","00000","Resource
enforcer finds cpu sub-system is disabled",,,,,,"explain analyze select * from
number;",0,,"resourceenforcer.c",908,
2016-09-26 13:02:25.605716
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
12:32:31
PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG2","00000","Current nice
level of the process: 19",,,,,,"explain analyze select * from
number;",0,,"postgres.c",283,
2016-09-26 13:02:25.605856
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
12:32:31
PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG2","00000","Reniced process
to level 19",,,,,,"explain analyze select * from number;",0,,"postgres.c",302,
2016-09-26 13:02:25.606073
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
12:32:31
PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG5","00000","GetSnapshotData
setting globalxmin and xmin to 6228",,,,,,"explain analyze select * from
number;",0,,"procarray.c",552,
2016-09-26 13:02:25.606306
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
12:32:31
PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","Inserted entry
for query (sessionid=143, commandcnt=74)",,,,,,"explain analyze select * from
number;",0,,"workfile_queryspace.c",283,
2016-09-26 13:02:25.606748
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","Have
both IPv6 and IPv4 choices",,,,,,"explain analyze select * from
number;",0,,"ic_udp.c",1291,
2016-09-26 13:02:25.606978
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
12:32:31
PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","receive socket
ai_family 10 ai_socktype 2 ai_protocol 17",,,,,,"explain analyze select * from
number;",0,,"ic_udp.c",1303,
2016-09-26 13:02:25.607098
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
12:32:31
PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","receive socket 6
ai_family 10 ai_socktype 2 ai_protocol 17",,,,,,"explain analyze select * from
number;",0,,"ic_udp.c",1307,
2016-09-26 13:02:25.607207
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","bind
addrlen 28 fam 10",,,,,,"explain analyze select * from
number;",0,,"ic_udp.c",1318,
2016-09-26 13:02:25.607320
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
12:32:31
PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","UDP-IC: xmit
default buffer size 124928 bytes",,,,,,"explain analyze select * from
number;",0,,"ic_udp.c",2200,
2016-09-26 13:02:25.607555
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
12:32:31
PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","UDP-IC: xmit use
buffer size 2097152 bytes",,,,,,"explain analyze select * from
number;",0,,"ic_udp.c",2215,
2016-09-26 13:02:25.607678
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
12:32:31
PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","UDP-IC: xmit
default buffer size 124928 bytes",,,,,,"explain analyze select * from
number;",0,,"ic_udp.c",2200,
2016-09-26 13:02:25.607787
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
12:32:31
PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","UDP-IC: xmit use
buffer size 2097152 bytes",,,,,,"explain analyze select * from
number;",0,,"ic_udp.c",2215,
2016-09-26 13:02:25.607939
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
12:32:31
PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","GetSockAddr
socket ai_family 2 ai_socktype 2 ai_protocol 17 for
172.28.21.157",,,,,,"explain analyze select * from number;",0,,"ic_udp.c",3058,
2016-09-26 13:02:25.608052
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","We are
inet6, remote is inet. Converting to v4 mapped address.",,,,,,"explain analyze
select * from number;",0,,"ic_udp.c",3137,
2016-09-26 13:02:25.608249
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","read
index 0 key 17000 for relation pg_attribute",,,,,,"explain analyze select *
from number;",0,,"cdbinmemheapam.c",499,
2016-09-26 13:02:25.608706
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","read
index 1 key 17000 for relation pg_attribute",,,,,,"explain analyze select *
from number;",0,,"cdbinmemheapam.c",499,
2016-09-26 13:02:25.608836
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","read
index 2 key 17000 for relation pg_attribute",,,,,,"explain analyze select *
from number;",0,,"cdbinmemheapam.c",499,
2016-09-26 13:02:25.608966
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","read
index 3 key 17000 for relation pg_attribute",,,,,,"explain analyze select *
from number;",0,,"cdbinmemheapam.c",499,
2016-09-26 13:02:25.609083
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","read
index 4 key 17000 for relation pg_attribute",,,,,,"explain analyze select *
from number;",0,,"cdbinmemheapam.c",499,
2016-09-26 13:02:25.609200
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","read
index 5 key 17000 for relation pg_attribute",,,,,,"explain analyze select *
from number;",0,,"cdbinmemheapam.c",499,
2016-09-26 13:02:25.609316
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","read
index 6 key 17000 for relation pg_attribute",,,,,,"explain analyze select *
from number;",0,,"cdbinmemheapam.c",499,
2016-09-26 13:02:25.609657
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
12:32:31 PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG1","00000","read
index 7 key 17000 for relation pg_attribute",,,,,,"explain analyze select *
from number;",0,,"cdbinmemheapam.c",499,
2016-09-26 13:02:25.613152
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
12:32:31
PDT,6228,con143,cmd74,seg0,slice1,,x6228,sx1,"DEBUG5","00000","Parquet metadata
file footer length index: 198",,,,,,"explain analyze select * from
number;",0,,"cdbparquetfooterprocessor.c",141,
2016-09-26 13:02:25.676719
PDT,,,p380675,th795609216,,,,0,,,seg-10000,,,,,"LOG","00000","3rd party error
log:
2016-09-26 13:02:25.676477, p384452, th140708219193472, ERROR cannot setup
block reader for Block: [block pool ID:
BP-2023073008-172.28.21.63-1462922052672 block ID 1073742008_1186] file
/hawq_default/16385/16543/17000/10 on Datanode: hdw2.hdp.local(172.28.21.155).
LocalBlockReader.cpp: 127: HdfsIOException: Failed to construct
LocalBlockReader for block: [block pool ID:
BP-2023073008-172.28.21.63-1462922052672 block ID 1073742008_1186].
@
Hdfs::Internal::LocalBlockReader::LocalBlockReader(boost::shared_ptr<Hdfs::Internal::ReadShortCircuitInfo>
const&, Hdfs::Internal::ExtendedBlock const&, long, bool,
Hdfs::Internal::SessionConfig&, std::vector<char, std::allocator<char> >&)
@ Hdfs::Internal::InputStreamImpl::setupBlockReader(bool)
@ Hdfs::Internal::InputStreamImpl::readOneBlock(char*, int, bool)
@ Hdfs::Internal::InputStreamImpl::readInternal(char*, int)
@ Hdfs::Internal::InputStreamImpl::read(char*, int)
@ hdfsRead
@ gpfs_hdfs_read
@ HdfsRead
@ FileRead
@ readParquetFooter
@ ParquetStorageRead_OpenFile
@ parquet_getnext
@ ParquetScanNext
@ ExecTableScan
@ ExecProcNode
@ ExecMotion
@ ExecProcNode
@ ExecutePlan
@ ExecutorRun
@ PortalRunSelect
@ PortalRun
@ PostgresMain
@ BackendStartup
@ ServerLoop
@ PostmasterMain
@ main
@ __libc_start_main
@ Unknown
Caused by
LocalBlockReader.cpp: 283: HdfsIOException: LocalBlockReader failed to skip
from position: 0, length: 0, block: [block pool ID:
BP-2023073008-172.28.21.63-1462922052672 block ID 1073742008_1186].
@ Hdfs::Internal::LocalBlockReader::skip(long)
@
Hdfs::Internal::LocalBlockReader::LocalBlockReader(boost::shared_ptr<Hdfs::Internal::ReadShortCircuitInfo>
const&, Hdfs::Internal::ExtendedBlock const&, long, bool,
Hdfs::Internal::SessionConfig&, std::vector<char, std::allocator<char> >&)
@ Hdfs::Internal::InputStreamImpl::setupBlockReader(bool)
@ Hdfs::Internal::InputStreamImpl::readOneBlock(char*, int, bool)
@ Hdfs::Internal::InputStreamImpl::readInternal(char*, int)
@ Hdfs::Internal::InputStreamImpl::read(char*, int)
@ hdfsRead
@ gpfs_hdfs_read
@ HdfsRead
@ FileRead
@ readParquetFooter
@ ParquetStorageRead_OpenFile
@ parquet_getnext
@ ParquetScanNext
@ ExecTableScan
@ ExecProcNode
@ ExecMotion
@ ExecProcNode
@ ExecutePlan
@ ExecutorRun
@ PortalRunSelect
@ PortalRun
@ PostgresMain
@ BackendStartup
@ ServerLoop
@ PostmasterMain
@ main
@ __libc_start_main
@ Unknown
Caused by
LocalBlockReader.cpp: 156: ChecksumException: LocalBlockReader checksum not
match for block: [block pool ID: BP-2023073008-172.28.21.63-1462922052672 block
ID 1073742008_1186]
@ Hdfs::Internal::LocalBlockReader::readAndVerify(int)
@ Hdfs::Internal::LocalBlockReader::skip(long)
@
Hdfs::Internal::LocalBlockReader::LocalBlockReader(boost::shared_ptr<Hdfs::Internal::ReadShortCircuitInfo>
const&, Hdfs::Internal::ExtendedBlock const&, long, bool,
Hdfs::Internal::SessionConfig&, std::vector<char, std::allocator<char> >&)
@ Hdfs::Internal::InputStreamImpl::setupBlockReader(bool)
@ Hdfs::Internal::InputStreamImpl::readOneBlock(char*, int, bool)
@ Hdfs::Internal::InputStreamImpl::readInternal(char*, int)
@ Hdfs::Internal::InputStreamImpl::read(char*, int)
@ hdfsRead
@ gpfs_hdfs_read
@ HdfsRead
@ FileRead
@ readParquetFooter
@ ParquetStorageRead_OpenFile
@ parquet_getnext
@ ParquetScanNext
@ ExecTableScan
@ ExecProcNode
@ ExecMotion
@ ExecProcNode
@ ExecutePlan
@ ExecutorRun
@ PortalRunSelect
@ PortalRun
@ PostgresMain
@ BackendStartup
@ ServerLoop
@ PostmasterMain
@ main
@ __libc_start_main
@ Unknown
retry the same node but disable read shortcircuit
feature",,,,,,,,"SysLoggerMain","syslogger.c",518,
2016-09-26 13:02:25.680638
PDT,"gpadmin","sr37228_repro",p384452,th795609216,"172.28.21.157","30347",2016-09-26
{code}
Even though it correctly detected the bad checksum using the LocalBlockReader,
when it calls the RemoteBlockReader it does not appear to detect the bad
checksum, and the read is allowed to go through.
{code}
sr37228_repro=# select * from number;
id
----------
16777217
16777216
0
1
1
(5 rows)
Checking hdfs://hdm1.hdp.local:8020/hawq_default/16385/16543/17000/10 health
Connecting to namenode via
http://hdm1.hdp.local:50070/fsck?ugi=gpadmin&blocks=1&locations=1&files=1&path=%2Fhawq_default%2F16385%2F16543%2F17000%2F10
FSCK started by gpadmin (auth:SIMPLE) from /172.28.21.157 for path
/hawq_default/16385/16543/17000/10 at Mon Sep 26 12:07:53 PDT 2016
/hawq_default/16385/16543/17000/10 206 bytes, 1 block(s): OK
0. BP-2023073008-172.28.21.63-1462922052672:blk_1073742008_1186 len=206 repl=3
[DatanodeInfoWithStorage[172.28.21.155:50010,DS-1a18c785-48e5-4ab8-9228-b3f6857b952a,DISK],
DatanodeInfoWithStorage[172.28.19.211:50010,DS-6bf49ae7-6745-448b-803d-d12d93acad1d,DISK],
DatanodeInfoWithStorage[172.28.21.156:50010,DS-d22b0f7f-7065-42c4-bb66-ea361ec5e56a,DISK]]
Status: HEALTHY
Total size: 206 B
Total dirs: 0
Total files: 1
Total symlinks: 0
Total blocks (validated): 1 (avg. block size 206 B)
Minimally replicated blocks: 1 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 3.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 3
Number of racks: 1
FSCK ended at Mon Sep 26 12:07:53 PDT 2016 in 0 milliseconds
The filesystem under path '/hawq_default/16385/16543/17000/10' is HEALTHY
{code}
The behavior of InputStreamImpl::setupBlockReader appears to be to:
1. Attempt to read the block locally using LocalBlockReader
2. If the local block read fails, attempt to read the block from the next
available node using RemoteBlockReader
3. Continue to read all the available blocks using RemoteBlockReader until we
have no more blocks to read.
In this case, the RemoteBlockReader appears to ignore the bad checksum in the
block, and returns wrong results.
Questions:
1. When we detect a bad checksum on the local block, why do we not mark the
block as corrupt with the NameNode?
2. When we read the block using RemoteBlockReader, why doesn't it detect the
bad block?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)