[ https://issues.apache.org/jira/browse/IMPALA-6761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zach Amsden closed IMPALA-6761. ------------------------------- Resolution: Fixed Fixed by Commit 380e17aa3cf678d4502245f12d8a77f58f4b8996 in impala's branch refs/heads/master from Zach Amsden [ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=380e17a ] IMPALA-6389: Make '\0' delimited text files work Initially I didn't want to fully implement this, as the metadata for these tables can't even be fully stored in Postgres; however after digging into some older documentation, it appears that the ASCII NUL character actually has been used as a field separator in various vendors CSV implementation. Therefore, this patch attempts to make things as non-broken as possible and allows \0 as a field or tuple delimiter. Collection column delimiters are not allowed to be \0, as they genuinly may not exist and we don't want to force special escaping on an arbitrary character. Note that the field delimiter must be distinct from the tuple delimiter when they both exist; if it is not, the effect will be that there is no field delimiter (this is actually possible with single column tables). Testing: Created a zero delimited table as described in the JIRA, using MySQL backed Hive metastore; ran select * from tab_separated on the table, updated the unit test. Additionally, build ASAN and ran the unit test. Change-Id: I2190c57681f29f34ee1eb393e30dfdda5839098c Reviewed-on: http://gerrit.cloudera.org:8080/9857 Tested-by: Impala Public Jenkins Reviewed-by: Zach Amsden <zams...@cloudera.com> > delimited-text-parser-test fails in ASAN build > ---------------------------------------------- > > Key: IMPALA-6761 > URL: https://issues.apache.org/jira/browse/IMPALA-6761 > Project: IMPALA > Issue Type: Bug > Components: Backend > Affects Versions: Impala 2.12.0 > Reporter: Michael Ho > Assignee: Zach Amsden > Priority: Blocker > Labels: broken-build > > Hi [~zamsden], could this be related to your recent change to fix IMPALA-6389 > ? > {noformat} > 03:26:07 [ RUN ] DelimitedTextParser.SpecialDelimiters > 03:26:07 ================================================================= > 03:26:07 ==14342==ERROR: AddressSanitizer: stack-buffer-overflow on address > 0x7fff33da29c1 at pc 0x00000141f344 bp 0x7fff33da1d20 sp 0x7fff33da1d18 > 03:26:07 READ of size 1 at 0x7fff33da29c1 thread T0 > 03:26:07 #0 0x141f343 in > impala::DelimitedTextParser<true>::ReturnCurrentColumn() const > /data/jenkins/workspace/impala-asf-2.x-core-asan/repos/Impala/be/src/exec/delimited-text-parser.h:114:39 > 03:26:07 #1 0x141bf49 in impala::Status > impala::DelimitedTextParser<true>::AddColumn<true>(long, char**, int*, > impala::FieldLocation*) > /data/jenkins/workspace/impala-asf-2.x-core-asan/repos/Impala/be/src/exec/delimited-text-parser.inline.h:62:7 > 03:26:07 #2 0x1419517 in > impala::DelimitedTextParser<true>::ParseFieldLocations(int, long, char**, > char**, impala::FieldLocation*, int*, int*, char**) > /data/jenkins/workspace/impala-asf-2.x-core-asan/repos/Impala/be/src/exec/delimited-text-parser.cc:194:43 > 03:26:07 #3 0x13f8ed7 in > impala::Validate(impala::DelimitedTextParser<true>*, std::string const&, int, > char, int, int) > /data/jenkins/workspace/impala-asf-2.x-core-asan/repos/Impala/be/src/exec/delimited-text-parser-test.cc:57:15 > 03:26:07 #4 0x13fb274 in > impala::DelimitedTextParser_SpecialDelimiters_Test::TestBody() > /data/jenkins/workspace/impala-asf-2.x-core-asan/repos/Impala/be/src/exec/delimited-text-parser-test.cc:211:3 > 03:26:07 #5 0x3f3fc52 in void > testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, > void>(testing::Test*, void (testing::Test::*)(), char const*) > (/data/jenkins/workspace/impala-asf-2.x-core-asan/repos/Impala/be/build/debug/exec/delimited-text-parser-test+0x3f3fc52) > 03:26:07 #6 0x3f375a9 in testing::Test::Run() > (/data/jenkins/workspace/impala-asf-2.x-core-asan/repos/Impala/be/build/debug/exec/delimited-text-parser-test+0x3f375a9) > 03:26:07 #7 0x3f376f7 in testing::TestInfo::Run() > (/data/jenkins/workspace/impala-asf-2.x-core-asan/repos/Impala/be/build/debug/exec/delimited-text-parser-test+0x3f376f7) > 03:26:07 #8 0x3f377d4 in testing::TestCase::Run() > (/data/jenkins/workspace/impala-asf-2.x-core-asan/repos/Impala/be/build/debug/exec/delimited-text-parser-test+0x3f377d4) > 03:26:07 #9 0x3f38a57 in testing::internal::UnitTestImpl::RunAllTests() > (/data/jenkins/workspace/impala-asf-2.x-core-asan/repos/Impala/be/build/debug/exec/delimited-text-parser-test+0x3f38a57) > 03:26:07 #10 0x3f38d32 in testing::UnitTest::Run() > (/data/jenkins/workspace/impala-asf-2.x-core-asan/repos/Impala/be/build/debug/exec/delimited-text-parser-test+0x3f38d32) > 03:26:07 #11 0x13fb927 in main > /data/jenkins/workspace/impala-asf-2.x-core-asan/repos/Impala/be/src/exec/delimited-text-parser-test.cc:221:192 > 03:26:07 #12 0x7fdc3ec02cdc in __libc_start_main > (/lib64/libc.so.6+0x1ecdc) > 03:26:07 #13 0x13064a0 in _start > (/data/jenkins/workspace/impala-asf-2.x-core-asan/repos/Impala/be/build/debug/exec/delimited-text-parser-test+0x13064a0) > 03:26:07 > 03:26:07 Address 0x7fff33da29c1 is located in stack of thread T0 at offset 33 > in frame > 03:26:07 #0 0x13fa74f in > impala::DelimitedTextParser_SpecialDelimiters_Test::TestBody() > /data/jenkins/workspace/impala-asf-2.x-core-asan/repos/Impala/be/src/exec/delimited-text-parser-test.cc:149 > 03:26:07 > 03:26:07 This frame has 56 object(s): > 03:26:07 [32, 33) 'is_materialized_col' <== Memory access at offset 33 > overflows this variable > 03:26:07 [48, 208) 'tuple_delim_parser' > 03:26:07 [272, 432) 'nul_delim_parser' > 03:26:07 [496, 656) 'nul_field_parser' > 03:26:07 [720, 728) 'ref.tmp' > 03:26:07 [752, 753) 'ref.tmp4' > 03:26:07 [768, 776) 'ref.tmp5' > 03:26:07 [800, 801) 'ref.tmp6' > 03:26:07 [816, 824) 'ref.tmp7' > 03:26:07 [848, 849) 'ref.tmp8' > 03:26:07 [864, 872) 'ref.tmp9' > 03:26:07 [896, 897) 'ref.tmp10' > 03:26:07 [912, 920) 'ref.tmp11' > 03:26:07 [944, 945) 'ref.tmp12' > 03:26:07 [960, 968) 'ref.tmp13' > 03:26:07 [992, 993) 'ref.tmp14' > 03:26:07 [1008, 1016) 'ref.tmp15' > 03:26:07 [1040, 1041) 'ref.tmp16' > 03:26:07 [1056, 1064) 'nul1' > 03:26:07 [1088, 1089) 'ref.tmp17' > 03:26:07 [1104, 1112) 'nul2' > 03:26:07 [1136, 1137) 'ref.tmp18' > 03:26:07 [1152, 1160) 'nul3' > 03:26:07 [1184, 1185) 'ref.tmp19' > 03:26:07 [1200, 1208) 'nul4' > 03:26:07 [1232, 1233) 'ref.tmp20' > 03:26:07 [1248, 1256) 'data' > 03:26:07 [1280, 1281) 'ref.tmp21' > 03:26:07 [1296, 1304) 'ref.tmp22' > 03:26:07 [1328, 1332) 'ref.tmp24' > 03:26:07 [1344, 1360) 'temp.lvalue' > 03:26:07 [1376, 1384) 'ref.tmp27' > 03:26:07 [1408, 1416) 'ref.tmp31' > 03:26:07 [1440, 1444) 'ref.tmp34' > 03:26:07 [1456, 1472) 'temp.lvalue38' > 03:26:07 [1488, 1496) 'ref.tmp39' > 03:26:07 [1520, 1528) 'ref.tmp43' > 03:26:07 [1552, 1556) 'ref.tmp46' > 03:26:07 [1568, 1584) 'temp.lvalue50' > 03:26:07 [1600, 1608) 'ref.tmp51' > 03:26:07 [1632, 1640) 'ref.tmp55' > 03:26:07 [1664, 1668) 'ref.tmp58' > 03:26:07 [1680, 1696) 'temp.lvalue62' > 03:26:07 [1712, 1720) 'ref.tmp63' > 03:26:07 [1744, 1752) 'nulsse1' > 03:26:07 [1776, 1777) 'ref.tmp65' > 03:26:07 [1792, 1800) 'nulsse2' > 03:26:07 [1824, 1825) 'ref.tmp66' > 03:26:07 [1840, 1848) 'nulsse3' > 03:26:07 [1872, 1873) 'ref.tmp67' > 03:26:07 [1888, 1896) 'nulsse4' > 03:26:07 [1920, 1921) 'ref.tmp68' > 03:26:07 [1936, 1944) 'field1' > 03:26:07 [1968, 1969) 'ref.tmp69' > 03:26:07 [1984, 1992) 'field2' > 03:26:07 [2016, 2017) 'ref.tmp70' > 03:26:07 HINT: this may be a false positive if your program uses some custom > stack unwind mechanism or swapcontext > 03:26:07 (longjmp and C++ exceptions *are* supported) > 03:26:07 SUMMARY: AddressSanitizer: stack-buffer-overflow > /data/jenkins/workspace/impala-asf-2.x-core-asan/repos/Impala/be/src/exec/delimited-text-parser.h:114:39 > in impala::DelimitedTextParser<true>::ReturnCurrentColumn() const > 03:26:07 Shadow bytes around the buggy address: > 03:26:07 0x1000667ac4e0: f2 f2 f2 f2 f2 f2 f2 f2 f2 f2 00 00 00 f2 f2 f2 > 03:26:07 0x1000667ac4f0: f2 f2 01 f2 04 f2 04 f2 00 f2 f2 f2 00 f2 f2 f2 > 03:26:07 0x1000667ac500: 00 00 f2 f2 00 f2 f2 f2 00 f2 f2 f2 00 00 f2 f2 > 03:26:07 0x1000667ac510: 00 f2 f2 f2 00 f3 f3 f3 00 00 00 00 00 00 00 00 > 03:26:07 0x1000667ac520: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 03:26:07 =>0x1000667ac530: 00 00 00 00 f1 f1 f1 f1[01]f2 00 00 00 00 00 00 > 03:26:07 0x1000667ac540: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2 > 03:26:07 0x1000667ac550: f2 f2 f2 f2 f2 f2 00 00 00 00 00 00 00 00 00 00 > 03:26:07 0x1000667ac560: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2 > 03:26:07 0x1000667ac570: f2 f2 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 03:26:07 0x1000667ac580: 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2 f2 f2 00 f2 > 03:26:07 Shadow byte legend (one shadow byte represents 8 application bytes): > 03:26:07 Addressable: 00 > 03:26:07 Partially addressable: 01 02 03 04 05 06 07 > 03:26:07 Heap left redzone: fa > 03:26:07 Heap right redzone: fb > 03:26:07 Freed heap region: fd > 03:26:07 Stack left redzone: f1 > 03:26:07 Stack mid redzone: f2 > 03:26:07 Stack right redzone: f3 > 03:26:07 Stack partial redzone: f4 > 03:26:07 Stack after return: f5 > 03:26:07 Stack use after scope: f8 > 03:26:07 Global redzone: f9 > 03:26:07 Global init order: f6 > 03:26:07 Poisoned by user: f7 > 03:26:07 Container overflow: fc > 03:26:07 Array cookie: ac > 03:26:07 Intra object redzone: bb > 03:26:07 ASan internal: fe > 03:26:07 Left alloca redzone: ca > 03:26:07 Right alloca redzone: cb > 03:26:07 ==14342==ABORTING > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)