[
https://issues.apache.org/jira/browse/PHOENIX-7367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Viraj Jasani reassigned PHOENIX-7367:
-------------------------------------
Assignee: Ujjawal Kumar
> Snapshot based mapreduce jobs fails after HBASE-28401
> -----------------------------------------------------
>
> Key: PHOENIX-7367
> URL: https://issues.apache.org/jira/browse/PHOENIX-7367
> Project: Phoenix
> Issue Type: Bug
> Reporter: Ujjawal Kumar
> Assignee: Ujjawal Kumar
> Priority: Major
> Attachments: Screenshot 2024-07-19 at 8.18.06 PM.png, Screenshot
> 2024-07-19 at 8.18.25 PM.png
>
>
> HBASE-28401 had a regression due to which HRegion#close throws NPE while
> trying to close the memstore within the mapper
> Due to this, snapshot based MR jobs have started failing in phoenix.
> This is due to the fact that TableSnapshotResultIterator ends up trying to
> release the read lock twice via HRegion#closeRegionOperation
> * TableSnapshotResultIterator's next method [calls ScanningResultIterator's
> next
> method|https://github.com/apache/phoenix/blob/1e96a2756eaf0a2201a50579789190e8c10747df/phoenix-core-server/src/main/java/org/apache/phoenix/iterate/TableSnapshotResultIterator.java#L180].
> *
> ** ScanningResultIterator's [next tries to close the SnapshotScanner
> early|https://github.com/apache/phoenix/blob/1e96a2756eaf0a2201a50579789190e8c10747df/phoenix-core-client/src/main/java/org/apache/phoenix/iterate/ScanningResultIterator.java#L225]
> ** Within [SnapshotScanner's close
> method|https://github.com/apache/phoenix/blob/1e96a2756eaf0a2201a50579789190e8c10747df/phoenix-core-server/src/main/java/org/apache/phoenix/iterate/SnapshotScanner.java#L180-L187]
> *
> **
> *** HRegion#closeRegionOperation released the read lock and was successful
> *** HRegion#close which threw IOException due to memstore issue
> (HBASE-28401)
> *** SnapshotScanner catches the IOException but doesn't set region field to
> null
> * TableSnapshotResultIterator's [finally block calls
> ScanningResultIterator's close
> method|https://github.com/apache/phoenix/blob/1e96a2756eaf0a2201a50579789190e8c10747df/phoenix-core-server/src/main/java/org/apache/phoenix/iterate/TableSnapshotResultIterator.java#L187-L190].
> *
> **
> *** *ScanningResultIterator's close is called again*
> *** *Since region field wasn't null,* *HRegion#closeRegionOperation is
> called again and throws IllegalMonitorStateException while trying to release
> the read lock*
> *
> **
> *** The IllegalMonitorStateException then causes the whole mapper to fail
> It doesn't cause failure while doing snapshot reads via HBase (ref
> HBASE-28743 where same NPE was observed but mapper still passes)
> , because the closest equivalent code (RecordReader within
> TableSnapshotInputFormat) doesn't tries to close the region [as part of it's
> nextKeyValue
> method|https://github.com/apache/hbase/blob/master/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/TableSnapshotInputFormatImpl.java#L275-L280].
>
> This is generally much safer [because record readers are always closed
> explicitly (even if mapper's run method
> fails)|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java#L466-L481]
> There are 2 improvements that can be done here :
> 1. Disable mslab for region created within snapshot (by setting
> hbase.hregion.memstore.mslab.enabled set to false)
> 2. In TableSnapshotResultIterator - Remove the the SnapshotScanner's close
> (via ScanningResultIterator) called within next method. It would anyways be
> closed by the mapper at the end
--
This message was sent by Atlassian Jira
(v8.20.10#820010)