[
https://issues.apache.org/jira/browse/IMPALA-8394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael Ho resolved IMPALA-8394.
--------------------------------
Resolution: Not A Problem
Fix Version/s: Not Applicable
[~stakiar] is right. Forcing a seek fixes the problem. Thanks Sahil. Sorry for
the confusion.
> Inconsistent data read from S3a connector
> -----------------------------------------
>
> Key: IMPALA-8394
> URL: https://issues.apache.org/jira/browse/IMPALA-8394
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Affects Versions: Impala 3.2.0, Impala 3.3.0
> Reporter: Michael Ho
> Assignee: Michael Ho
> Priority: Critical
> Fix For: Not Applicable
>
>
> While testing a build with remote data cache
> (https://github.com/michaelhkw/impala/commits/remote-cache-debug) with S3, it
> was noticed that data read back from S3 through the HDFS S3 adaptor was
> inconsistent. This was confirmed by computing the checksum of the buffer
> right after a successful read. The following are the activities of 2 threads
> in the log.
> Both thread 18922 and 18924 tried to look up
> s3a://impala-remote-reads/tpcds_3000_decimal_parquet.db/store_sales/ss_sold_date_sk=2451097/46490ec1e79b939b-b427e7c50000000b_986349703_data.0.parq
> at offset: 89814317. Both of them hit cache miss. They both read from S3 for
> the content. Thread 18924 won the race to insert into the cache. When 18922
> came around later to try to insert the same entry into the cache, it noticed
> that the checksum of the content inserted by thread 18924 was different from
> its own content.
> Please note that the checksum of the bytes read from S3 were computed and
> logged in {{hdfs-file-reader.cc}} before the insertion into the cache (which
> also computed the checksum again) and the inconsistency was also observed in
> {{hdfs-file-reader.cc}} already, with thread 18924 computing
> {{8299739883147237483}} while thread 18922 computing {{9118051972380785265}}.
> We re-ran the same experiment with {{--use_hdfs_pread=true}} and the problem
> went away. While I don't rule out bugs in the cache prototype at this point,
> the debugging so far suggests the content read back from S3 via HDFS S3a
> connector is inconsistent when pread was disabled. It could be that we
> inadvertently shared the file handle somehow or there are some race
> conditions in the S3a connector which got exposed by the timing change with
> the cache enabled.
> FWIW, we also ran the same experiment in HDFS remote read configuration and
> it was not reproducible there either.
> Thread 18924
> {noformat}
> I0405 12:02:15.316999 18924 data-cache.cc:344]
> ed4c2ab7791b5883:9f1507450000005f] Looking up
> s3a://impala-remote-reads/tpcds_3000_decimal_parquet.db/store_sales/ss_sold_date_sk=2451097/46490ec1e79b939b-b427e7c50000000b_986349703_data.0.parq
> mtime: 1549425284000 offset: 89814317 bytes_to_read: 8332914 bytes_read: 0
> buffer: 4d600000
> I0405 12:02:15.593314 18924 hdfs-file-reader.cc:185]
> ed4c2ab7791b5883:9f1507450000005f] Caching file
> s3a://impala-remote-reads/tpcds_3000_decimal_parquet.db/store_sales/ss_sold_date_sk=2451097/46490ec1e79b939b-b427e7c50000000b_986349703_data.0.parq
> mtime: 1549425284000 offset: 89814317 bytes_read 8332914 checksum
> 8299739883147237483
> I0405 12:02:15.596087 18924 data-cache.cc:233]
> ed4c2ab7791b5883:9f1507450000005f] Storing file
> /data0/1/impala/datacache/cf4b57f89e5985f2:487084b5c69b208b offset 1669431296
> len 8332914 checksum 8299739883147237483
> I0405 12:02:15.602699 18924 data-cache.cc:361]
> ed4c2ab7791b5883:9f1507450000005f] Storing
> s3a://impala-remote-reads/tpcds_3000_decimal_parquet.db/store_sales/ss_sold_date_sk=2451097/46490ec1e79b939b-b427e7c50000000b_986349703_data.0.parq
> mtime: 1549425284000 offset: 89814317 bytes_to_read: 8332914 buffer:
> 4d600000 stored: true
> {noformat}
> Thread 18922:
> {noformat}
> I0405 12:02:15.011065 18922 data-cache.cc:344]
> ed4c2ab7791b5883:9f150745000000da] Looking up
> s3a://impala-remote-reads/tpcds_3000_decimal_parquet.db/store_sales/ss_sold_date_sk=2451097/46490ec1e79b939b-b427e7c50000000b_986349703_data.0.parq
> mtime: 1549425284000 offset: 89814317 bytes_to_read: 8332914 bytes_read: 0
> buffer: 59200000
> I0405 12:02:16.281126 18922 hdfs-file-reader.cc:185]
> ed4c2ab7791b5883:9f150745000000da] Caching file
> s3a://impala-remote-reads/tpcds_3000_decimal_parquet.db/store_sales/ss_sold_date_sk=2451097/46490ec1e79b939b-b427e7c50000000b_986349703_data.0.parq
> mtime: 1549425284000 offset: 89814317 bytes_read 8332914 checksum
> 9118051972380785265
> I0405 12:02:16.282948 18922 data-cache.cc:166]
> ed4c2ab7791b5883:9f150745000000da] Storing duplicated file
> /data0/1/impala/datacache/cf4b57f89e5985f2:487084b5c69b208b offset 1669431296
> len 8332914 checksum 8299739883147237483 buffer checksum: 9118051972380785265
> E0405 12:02:16.282974 18922 data-cache.cc:171]
> ed4c2ab7791b5883:9f150745000000da] Write checksum mismatch for file
> /data0/1/impala/datacache/cf4b57f89e5985f2:487084b5c69b208b offset 1669431296
> entry len: 8332914 store_len: 8332914 Expected 8299739883147237483, Got
> 9118051972380785265.
> I0405 12:02:16.283023 18922 data-cache.cc:361]
> ed4c2ab7791b5883:9f150745000000da] Storing
> s3a://impala-remote-reads/tpcds_3000_decimal_parquet.db/store_sales/ss_sold_date_sk=2451097/46490ec1e79b939b-b427e7c50000000b_986349703_data.0.parq
> mtime: 1549425284000 offset: 89814317 bytes_to_read: 8332914 buffer:
> 59200000 stored: false
> {noformat}
> The problem is quite reproducible with TPCDS Q28 at TPCDS 3000 with parquet
> format.
> {noformat}
> select *
> from (select avg(ss_list_price) B1_LP
> ,count(ss_list_price) B1_CNT
> ,count(distinct ss_list_price) B1_CNTD
> from store_sales
> where ss_quantity between 0 and 5
> and (ss_list_price between 185 and 185+10
> or ss_coupon_amt between 10548 and 10548+1000
> or ss_wholesale_cost between 6 and 6+20)) B1,
> (select avg(ss_list_price) B2_LP
> ,count(ss_list_price) B2_CNT
> ,count(distinct ss_list_price) B2_CNTD
> from store_sales
> where ss_quantity between 6 and 10
> and (ss_list_price between 28 and 28+10
> or ss_coupon_amt between 6100 and 6100+1000
> or ss_wholesale_cost between 27 and 27+20)) B2,
> (select avg(ss_list_price) B3_LP
> ,count(ss_list_price) B3_CNT
> ,count(distinct ss_list_price) B3_CNTD
> from store_sales
> where ss_quantity between 11 and 15
> and (ss_list_price between 173 and 173+10
> or ss_coupon_amt between 6371 and 6371+1000
> or ss_wholesale_cost between 32 and 32+20)) B3,
> (select avg(ss_list_price) B4_LP
> ,count(ss_list_price) B4_CNT
> ,count(distinct ss_list_price) B4_CNTD
> from store_sales
> where ss_quantity between 16 and 20
> and (ss_list_price between 101 and 101+10
> or ss_coupon_amt between 2938 and 2938+1000
> or ss_wholesale_cost between 21 and 21+20)) B4,
> (select avg(ss_list_price) B5_LP
> ,count(ss_list_price) B5_CNT
> ,count(distinct ss_list_price) B5_CNTD
> from store_sales
> where ss_quantity between 21 and 25
> and (ss_list_price between 8 and 8+10
> or ss_coupon_amt between 5093 and 5093+1000
> or ss_wholesale_cost between 50 and 50+20)) B5,
> (select avg(ss_list_price) B6_LP
> ,count(ss_list_price) B6_CNT
> ,count(distinct ss_list_price) B6_CNTD
> from store_sales
> where ss_quantity between 26 and 30
> and (ss_list_price between 110 and 110+10
> or ss_coupon_amt between 2276 and 2276+1000
> or ss_wholesale_cost between 36 and 36+20)) B6
> limit 100;
> {noformat}
> cc'ing [~stakiar], [~joemcdonnell] [~lv] [~tlipcon] [~drorke]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)