[
https://issues.apache.org/jira/browse/IMPALA-7816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16737237#comment-16737237
]
Sahil Takiar commented on IMPALA-7816:
--------------------------------------
Would it make more sense to just wait for all the scanners to call
{{HdfsParquetScanner::Close}} before calling
{{HdfsScanNodeBase::StopAndFinalizeCounters}}? Seems odd that a scan-node can
be closed before its corresponding scanners get closed. Doing this would make
the code easier to understand, I would guess most devs would assume this to be
true, which is probably how the bug was introduced in the first place. There
might be other race conditions in the code as well due to this behavior,
although I haven't been able to produce any more.
> Race condition in HdfsScanNodeBase::StopAndFinalizeCounters
> -----------------------------------------------------------
>
> Key: IMPALA-7816
> URL: https://issues.apache.org/jira/browse/IMPALA-7816
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Affects Versions: Impala 3.1.0
> Reporter: Sahil Takiar
> Assignee: Sahil Takiar
> Priority: Major
> Labels: parquet
>
> While working on IMPALA-6964, I noticed that sometimes the runtime profile
> for a {{HDFS_SCAN_NODE}} will include {{File Formats: PARQUET/NONE:2}} and
> sometimes it won't (depending on the query). However, looking at the code,
> any scan of Parquet files should include this line.
> I debugged the code and there seems to a be a race condition where
> {{HdfsScanNodeBase::StopAndFinalizeCounters}} can be called before
> {{HdfsParquetScanner::Close}} is called for all the scan ranges. This causes
> the {{File Formats}} issue above because {{HdfsParquetScanner::Close}} calls
> {{HdfsScanNodeBase::RangeComplete}} which updates the shared object
> {{file_type_counts_}}, which is read in {{StopAndFinalizeCounters}} (so
> {{StopAndFinalizeCounters}} will write out the contents of
> {{file_type_counts_}} before all scanners can update it).
> {{StopAndFinalizeCounters}} can be called in two places:
> {{HdfsScanNodeBase::Close}} and in {{HdfsScanNode::GetNext}}. It can be
> called in {{GetNext}} when {{GetNextInternal}} reads enough rows to cross the
> query defined limit. So {{GetNext}} will call {{StopAndFinalizeCounters}}
> once the limit is reached, but not necessarily before the scanners are closed.
> I'm able to re-produce this locally by using the queries:
> {code:java}
> select * from functional_parquet.lineitem_sixblocks limit 10 {code}
> The runtime profile does not include {{File Formats}}
> {code:java}
> select * from functional_parquet.lineitem_sixblocks order by l_orderkey
> limit 10 {code}
> The runtime profile does include {{File Formats}}
> I tried to simply remove the call to {{StopAndFinalizeCounters}} from
> {{GetNext}} but that doesn't seem to work. It actually caused several other
> RP messages to get deleted (not entirely sure why).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]