[ 
https://issues.apache.org/jira/browse/IMPALA-6761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zach Amsden closed IMPALA-6761.
-------------------------------
    Resolution: Fixed

Fixed by 

Commit 380e17aa3cf678d4502245f12d8a77f58f4b8996 in impala's branch 
refs/heads/master from Zach Amsden
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=380e17a ]
IMPALA-6389: Make '\0' delimited text files work
Initially I didn't want to fully implement this, as the metadata
for these tables can't even be fully stored in Postgres; however
after digging into some older documentation, it appears that the
ASCII NUL character actually has been used as a field separator
in various vendors CSV implementation.
Therefore, this patch attempts to make things as non-broken as
possible and allows \0 as a field or tuple delimiter. Collection
column delimiters are not allowed to be \0, as they genuinly may
not exist and we don't want to force special escaping on an
arbitrary character. Note that the field delimiter must be distinct
from the tuple delimiter when they both exist; if it is not, the
effect will be that there is no field delimiter (this is actually
possible with single column tables).
Testing: Created a zero delimited table as described in the JIRA,
using MySQL backed Hive metastore; ran select * from tab_separated
on the table, updated the unit test. Additionally, build ASAN
and ran the unit test.
Change-Id: I2190c57681f29f34ee1eb393e30dfdda5839098c
Reviewed-on: http://gerrit.cloudera.org:8080/9857
Tested-by: Impala Public Jenkins
Reviewed-by: Zach Amsden <zams...@cloudera.com>

> delimited-text-parser-test fails in ASAN build
> ----------------------------------------------
>
>                 Key: IMPALA-6761
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6761
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 2.12.0
>            Reporter: Michael Ho
>            Assignee: Zach Amsden
>            Priority: Blocker
>              Labels: broken-build
>
> Hi [~zamsden], could this be related to your recent change to fix IMPALA-6389 
> ?
> {noformat}
> 03:26:07 [ RUN      ] DelimitedTextParser.SpecialDelimiters
> 03:26:07 =================================================================
> 03:26:07 ==14342==ERROR: AddressSanitizer: stack-buffer-overflow on address 
> 0x7fff33da29c1 at pc 0x00000141f344 bp 0x7fff33da1d20 sp 0x7fff33da1d18
> 03:26:07 READ of size 1 at 0x7fff33da29c1 thread T0
> 03:26:07     #0 0x141f343 in 
> impala::DelimitedTextParser<true>::ReturnCurrentColumn() const 
> /data/jenkins/workspace/impala-asf-2.x-core-asan/repos/Impala/be/src/exec/delimited-text-parser.h:114:39
> 03:26:07     #1 0x141bf49 in impala::Status 
> impala::DelimitedTextParser<true>::AddColumn<true>(long, char**, int*, 
> impala::FieldLocation*) 
> /data/jenkins/workspace/impala-asf-2.x-core-asan/repos/Impala/be/src/exec/delimited-text-parser.inline.h:62:7
> 03:26:07     #2 0x1419517 in 
> impala::DelimitedTextParser<true>::ParseFieldLocations(int, long, char**, 
> char**, impala::FieldLocation*, int*, int*, char**) 
> /data/jenkins/workspace/impala-asf-2.x-core-asan/repos/Impala/be/src/exec/delimited-text-parser.cc:194:43
> 03:26:07     #3 0x13f8ed7 in 
> impala::Validate(impala::DelimitedTextParser<true>*, std::string const&, int, 
> char, int, int) 
> /data/jenkins/workspace/impala-asf-2.x-core-asan/repos/Impala/be/src/exec/delimited-text-parser-test.cc:57:15
> 03:26:07     #4 0x13fb274 in 
> impala::DelimitedTextParser_SpecialDelimiters_Test::TestBody() 
> /data/jenkins/workspace/impala-asf-2.x-core-asan/repos/Impala/be/src/exec/delimited-text-parser-test.cc:211:3
> 03:26:07     #5 0x3f3fc52 in void 
> testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, 
> void>(testing::Test*, void (testing::Test::*)(), char const*) 
> (/data/jenkins/workspace/impala-asf-2.x-core-asan/repos/Impala/be/build/debug/exec/delimited-text-parser-test+0x3f3fc52)
> 03:26:07     #6 0x3f375a9 in testing::Test::Run() 
> (/data/jenkins/workspace/impala-asf-2.x-core-asan/repos/Impala/be/build/debug/exec/delimited-text-parser-test+0x3f375a9)
> 03:26:07     #7 0x3f376f7 in testing::TestInfo::Run() 
> (/data/jenkins/workspace/impala-asf-2.x-core-asan/repos/Impala/be/build/debug/exec/delimited-text-parser-test+0x3f376f7)
> 03:26:07     #8 0x3f377d4 in testing::TestCase::Run() 
> (/data/jenkins/workspace/impala-asf-2.x-core-asan/repos/Impala/be/build/debug/exec/delimited-text-parser-test+0x3f377d4)
> 03:26:07     #9 0x3f38a57 in testing::internal::UnitTestImpl::RunAllTests() 
> (/data/jenkins/workspace/impala-asf-2.x-core-asan/repos/Impala/be/build/debug/exec/delimited-text-parser-test+0x3f38a57)
> 03:26:07     #10 0x3f38d32 in testing::UnitTest::Run() 
> (/data/jenkins/workspace/impala-asf-2.x-core-asan/repos/Impala/be/build/debug/exec/delimited-text-parser-test+0x3f38d32)
> 03:26:07     #11 0x13fb927 in main 
> /data/jenkins/workspace/impala-asf-2.x-core-asan/repos/Impala/be/src/exec/delimited-text-parser-test.cc:221:192
> 03:26:07     #12 0x7fdc3ec02cdc in __libc_start_main 
> (/lib64/libc.so.6+0x1ecdc)
> 03:26:07     #13 0x13064a0 in _start 
> (/data/jenkins/workspace/impala-asf-2.x-core-asan/repos/Impala/be/build/debug/exec/delimited-text-parser-test+0x13064a0)
> 03:26:07 
> 03:26:07 Address 0x7fff33da29c1 is located in stack of thread T0 at offset 33 
> in frame
> 03:26:07     #0 0x13fa74f in 
> impala::DelimitedTextParser_SpecialDelimiters_Test::TestBody() 
> /data/jenkins/workspace/impala-asf-2.x-core-asan/repos/Impala/be/src/exec/delimited-text-parser-test.cc:149
> 03:26:07 
> 03:26:07   This frame has 56 object(s):
> 03:26:07     [32, 33) 'is_materialized_col' <== Memory access at offset 33 
> overflows this variable
> 03:26:07     [48, 208) 'tuple_delim_parser'
> 03:26:07     [272, 432) 'nul_delim_parser'
> 03:26:07     [496, 656) 'nul_field_parser'
> 03:26:07     [720, 728) 'ref.tmp'
> 03:26:07     [752, 753) 'ref.tmp4'
> 03:26:07     [768, 776) 'ref.tmp5'
> 03:26:07     [800, 801) 'ref.tmp6'
> 03:26:07     [816, 824) 'ref.tmp7'
> 03:26:07     [848, 849) 'ref.tmp8'
> 03:26:07     [864, 872) 'ref.tmp9'
> 03:26:07     [896, 897) 'ref.tmp10'
> 03:26:07     [912, 920) 'ref.tmp11'
> 03:26:07     [944, 945) 'ref.tmp12'
> 03:26:07     [960, 968) 'ref.tmp13'
> 03:26:07     [992, 993) 'ref.tmp14'
> 03:26:07     [1008, 1016) 'ref.tmp15'
> 03:26:07     [1040, 1041) 'ref.tmp16'
> 03:26:07     [1056, 1064) 'nul1'
> 03:26:07     [1088, 1089) 'ref.tmp17'
> 03:26:07     [1104, 1112) 'nul2'
> 03:26:07     [1136, 1137) 'ref.tmp18'
> 03:26:07     [1152, 1160) 'nul3'
> 03:26:07     [1184, 1185) 'ref.tmp19'
> 03:26:07     [1200, 1208) 'nul4'
> 03:26:07     [1232, 1233) 'ref.tmp20'
> 03:26:07     [1248, 1256) 'data'
> 03:26:07     [1280, 1281) 'ref.tmp21'
> 03:26:07     [1296, 1304) 'ref.tmp22'
> 03:26:07     [1328, 1332) 'ref.tmp24'
> 03:26:07     [1344, 1360) 'temp.lvalue'
> 03:26:07     [1376, 1384) 'ref.tmp27'
> 03:26:07     [1408, 1416) 'ref.tmp31'
> 03:26:07     [1440, 1444) 'ref.tmp34'
> 03:26:07     [1456, 1472) 'temp.lvalue38'
> 03:26:07     [1488, 1496) 'ref.tmp39'
> 03:26:07     [1520, 1528) 'ref.tmp43'
> 03:26:07     [1552, 1556) 'ref.tmp46'
> 03:26:07     [1568, 1584) 'temp.lvalue50'
> 03:26:07     [1600, 1608) 'ref.tmp51'
> 03:26:07     [1632, 1640) 'ref.tmp55'
> 03:26:07     [1664, 1668) 'ref.tmp58'
> 03:26:07     [1680, 1696) 'temp.lvalue62'
> 03:26:07     [1712, 1720) 'ref.tmp63'
> 03:26:07     [1744, 1752) 'nulsse1'
> 03:26:07     [1776, 1777) 'ref.tmp65'
> 03:26:07     [1792, 1800) 'nulsse2'
> 03:26:07     [1824, 1825) 'ref.tmp66'
> 03:26:07     [1840, 1848) 'nulsse3'
> 03:26:07     [1872, 1873) 'ref.tmp67'
> 03:26:07     [1888, 1896) 'nulsse4'
> 03:26:07     [1920, 1921) 'ref.tmp68'
> 03:26:07     [1936, 1944) 'field1'
> 03:26:07     [1968, 1969) 'ref.tmp69'
> 03:26:07     [1984, 1992) 'field2'
> 03:26:07     [2016, 2017) 'ref.tmp70'
> 03:26:07 HINT: this may be a false positive if your program uses some custom 
> stack unwind mechanism or swapcontext
> 03:26:07       (longjmp and C++ exceptions *are* supported)
> 03:26:07 SUMMARY: AddressSanitizer: stack-buffer-overflow 
> /data/jenkins/workspace/impala-asf-2.x-core-asan/repos/Impala/be/src/exec/delimited-text-parser.h:114:39
>  in impala::DelimitedTextParser<true>::ReturnCurrentColumn() const
> 03:26:07 Shadow bytes around the buggy address:
> 03:26:07   0x1000667ac4e0: f2 f2 f2 f2 f2 f2 f2 f2 f2 f2 00 00 00 f2 f2 f2
> 03:26:07   0x1000667ac4f0: f2 f2 01 f2 04 f2 04 f2 00 f2 f2 f2 00 f2 f2 f2
> 03:26:07   0x1000667ac500: 00 00 f2 f2 00 f2 f2 f2 00 f2 f2 f2 00 00 f2 f2
> 03:26:07   0x1000667ac510: 00 f2 f2 f2 00 f3 f3 f3 00 00 00 00 00 00 00 00
> 03:26:07   0x1000667ac520: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 03:26:07 =>0x1000667ac530: 00 00 00 00 f1 f1 f1 f1[01]f2 00 00 00 00 00 00
> 03:26:07   0x1000667ac540: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f2 f2
> 03:26:07   0x1000667ac550: f2 f2 f2 f2 f2 f2 00 00 00 00 00 00 00 00 00 00
> 03:26:07   0x1000667ac560: 00 00 00 00 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2
> 03:26:07   0x1000667ac570: f2 f2 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 03:26:07   0x1000667ac580: 00 00 00 00 00 00 f2 f2 f2 f2 f2 f2 f2 f2 00 f2
> 03:26:07 Shadow byte legend (one shadow byte represents 8 application bytes):
> 03:26:07   Addressable:           00
> 03:26:07   Partially addressable: 01 02 03 04 05 06 07 
> 03:26:07   Heap left redzone:       fa
> 03:26:07   Heap right redzone:      fb
> 03:26:07   Freed heap region:       fd
> 03:26:07   Stack left redzone:      f1
> 03:26:07   Stack mid redzone:       f2
> 03:26:07   Stack right redzone:     f3
> 03:26:07   Stack partial redzone:   f4
> 03:26:07   Stack after return:      f5
> 03:26:07   Stack use after scope:   f8
> 03:26:07   Global redzone:          f9
> 03:26:07   Global init order:       f6
> 03:26:07   Poisoned by user:        f7
> 03:26:07   Container overflow:      fc
> 03:26:07   Array cookie:            ac
> 03:26:07   Intra object redzone:    bb
> 03:26:07   ASan internal:           fe
> 03:26:07   Left alloca redzone:     ca
> 03:26:07   Right alloca redzone:    cb
> 03:26:07 ==14342==ABORTING
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to