[jira] [Commented] (HDDS-11187) ClassCastException in Recon Server during FileSizeCountTask Execution

Arafat Khan (Jira) Tue, 16 Jul 2024 01:14:05 -0700


    [ 
https://issues.apache.org/jira/browse/HDDS-11187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17866295#comment-17866295
 ]


Arafat Khan commented on HDDS-11187:
------------------------------------

h3. Problem Description

The existing implementation of *{{OMDBUpdatesHandler}}* in Recon saves events 
from all tables into a single map, {*}{{omdbLatestUpdateEvents}}{*}, using just 
the key structure. This leads to corruption when different tables, such as 
*{{keyTable}}* and {*}{{deletedTable}}{*}, use the same key structure 
({*}{{/volumeName/bucketName/keyName}}{*}). Consequently, events from different 
tables with identical keys can overwrite each other, resulting in data 
inconsistencies and causing issues like *{{ClassCastException}}* when the wrong 
event type is retrieved and cast downstream.
h3. Solution Description

To resolve this issue, we propose modifying the *{{omdbLatestUpdateEvents}}* 
map to include an additional layer that incorporates the table name. The new 
structure will be a nested map: \{*}{{Map<String, Map<Object, 
OMDBUpdateEvent>>}}{*}, where the outer map's key is the table name, and the 
inner map's key is the actual key structure. This ensures that events from 
different tables with the same key structure are stored separately, avoiding 
conflicts.
h3. Fix for ClassCastException and Future Improvements

This solution will fix the *{{ClassCastException}}* problem from the Recon end 
by ensuring that events from different tables are isolated within the map, 
preventing them from overwriting each other. However, we still need to address 
the root cause from the OM end to prevent the creation of incorrect events. 
Ensuring that each event is correctly classified and stored in the appropriate 
table at the source will further reinforce data integrity and prevent similar 
issues in the future. Additionally, logs have been added earlier to capture 
such events that can lead to {*}{{ClassCastException}}{*}. Generally, we ignore 
these events because these corrupted events are generated from the OM side of 
the code and hence need to be fixed. Our changes in this patch only fix the 
problem of possible corruption in event creation on the Recon side. The problem 
at the OM side still persists and needs to be fixed. The newly added logs in 
the previous patch by [HDDS-8310|[https://github.com/apache/ozone/pull/5043]]  
will help in identifying and reporting these issues. We just have to wait for 
the reporting of these logs.

> ClassCastException in Recon Server during FileSizeCountTask Execution
> ---------------------------------------------------------------------
>
>                 Key: HDDS-11187
>                 URL: https://issues.apache.org/jira/browse/HDDS-11187
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: Ozone Recon
>            Reporter: Arafat Khan
>            Assignee: Arafat Khan
>            Priority: Major
>             Fix For: 1.5.0
>
>
> A *ClassCastException* occurs in the Recon server during the 
> FileSizeCountTask, where RepeatedOmKeyInfo is incorrectly cast to OmKeyInfo, 
> causing task processing to fail.
>  
> {code:java}
> 2024-06-11 10:40:03,700 INFO 
> org.apache.hadoop.ozone.recon.tasks.FileSizeCountTask: Completed a 'process' 
> run of FileSizeCountTask.
> 2024-06-11 10:40:03,700 ERROR 
> org.apache.hadoop.ozone.recon.tasks.ReconTaskControllerImpl: Unexpected error 
> : 
> java.util.concurrent.ExecutionException: java.lang.ClassCastException: class 
> org.apache.hadoop.ozone.om.helpers.RepeatedOmKeyInfo cannot be cast to class 
> org.apache.hadoop.ozone.om.helpers.OmKeyInfo 
> (org.apache.hadoop.ozone.om.helpers.RepeatedOmKeyInfo and 
> org.apache.hadoop.ozone.om.helpers.OmKeyInfo are in unnamed module of loader 
> 'app')
>       at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
>       at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
>       at 
> org.apache.hadoop.ozone.recon.tasks.ReconTaskControllerImpl.processTaskResults(ReconTaskControllerImpl.java:247)
>       at 
> org.apache.hadoop.ozone.recon.tasks.ReconTaskControllerImpl.consumeOMEvents(ReconTaskControllerImpl.java:118)
>       at 
> org.apache.hadoop.ozone.recon.spi.impl.OzoneManagerServiceProviderImpl.syncDataFromOM(OzoneManagerServiceProviderImpl.java:511)
>       at 
> org.apache.hadoop.ozone.recon.spi.impl.OzoneManagerServiceProviderImpl.lambda$startSyncDataFromOM$0(OzoneManagerServiceProviderImpl.java:258)
>       at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
>       at 
> java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
>       at 
> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
>       at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>       at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>       at java.base/java.lang.Thread.run(Thread.java:834)
> Caused by: java.lang.ClassCastException: class 
> org.apache.hadoop.ozone.om.helpers.RepeatedOmKeyInfo cannot be cast to class 
> org.apache.hadoop.ozone.om.helpers.OmKeyInfo 
> (org.apache.hadoop.ozone.om.helpers.RepeatedOmKeyInfo and 
> org.apache.hadoop.ozone.om.helpers.OmKeyInfo are in unnamed module of loader 
> 'app')
>       at 
> org.apache.hadoop.ozone.recon.tasks.NSSummaryTaskWithFSO.processWithFSO(NSSummaryTaskWithFSO.java:90)
>       at 
> org.apache.hadoop.ozone.recon.tasks.NSSummaryTask.process(NSSummaryTask.java:97)
>       at 
> org.apache.hadoop.ozone.recon.tasks.ReconTaskControllerImpl.lambda$consumeOMEvents$0(ReconTaskControllerImpl.java:113)
>       at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>       ... 3 more {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDDS-11187) ClassCastException in Recon Server during FileSizeCountTask Execution

Reply via email to