devmadhuu opened a new pull request, #8212:
URL: https://github.com/apache/ozone/pull/8212

   ## What changes were proposed in this pull request?
   This PR change is to improve Recon OM DB Snapshot Tar ball (SST files) 
during bootstrap of Recon using streaming-Based approach for fetch and 
extraction.
   
   Instead of storing the full TAR file by Recon and waiting for complete 
transfer, let's extract files as they arrive using TarArchiveInputStream.
   This will:
       - Reduce disk I/O
       - Start processing sooner
       - Avoid extra storage needs
   
   **Why This is More Efficient**
       No Temporary TAR File
            -   Directly extracts files while streaming, eliminating the need 
to store the full TAR.
   
   **Starts Extracting Immediately**
          -    No waiting for the full file to be received; extraction happens 
as data arrives.
   
   **Lower Disk I/O & Storage Needs**
         - Removes unnecessary FileUtils.copyInputStreamToFile() call.
         - Avoids writing and re-reading the TAR file.
   
   **Handles Both Files & Directories**
           - Ensures correct directory structure before writing files.
   
   **Using Multithreading for Parallel Extraction**
   To extract files in parallel, we need to:
         - Use a thread pool to process multiple files at the same time.
         - Extract files asynchronously while maintaining order and efficiency.
         - Ensure correct handling of directories before writing files.
   
   ## What is the link to the Apache JIRA
   https://issues.apache.org/jira/browse/HDDS-12707
   
   ## How was this patch tested?
   Tested using existing junit test cases and integration tests as well as 
manually using local docker cluster.
   
   ```
   2025-03-26 13:08:39 2025-03-26 07:38:39,225 [Recon-SyncOM-0] INFO 
impl.OzoneManagerServiceProviderImpl: Last known sequence number before sync: 0
   2025-03-26 13:08:39 2025-03-26 07:38:39,226 [Recon-SyncOM-0] INFO 
impl.OzoneManagerServiceProviderImpl: Seq number of Recon's OM DB : 0
   2025-03-26 13:08:39 2025-03-26 07:38:39,226 [Recon-SyncOM-0] INFO 
impl.OzoneManagerServiceProviderImpl: Obtaining full snapshot from Ozone Manager
   2025-03-26 13:08:39 2025-03-26 07:38:39,236 [Recon-SyncOM-0] INFO 
impl.OzoneManagerServiceProviderImpl: Permissions for Recon DB directory 
'/data/metadata/recon' meet the minimum required permissions '750'
   2025-03-26 13:08:39 2025-03-26 07:38:39,521 [Recon-SyncOM-0] INFO 
impl.OzoneManagerServiceProviderImpl: Enqueued file: 000057.sst
   2025-03-26 13:08:39 2025-03-26 07:38:39,521 [pool-33-thread-1] INFO 
impl.OzoneManagerServiceProviderImpl: Dequeued file: 000057.sst
   2025-03-26 13:08:39 2025-03-26 07:38:39,522 [Recon-SyncOM-0] INFO 
impl.OzoneManagerServiceProviderImpl: Enqueued file: 000063.sst
   2025-03-26 13:08:39 2025-03-26 07:38:39,522 [pool-33-thread-2] INFO 
impl.OzoneManagerServiceProviderImpl: Dequeued file: 000063.sst
   2025-03-26 13:08:39 2025-03-26 07:38:39,522 [Recon-SyncOM-0] INFO 
impl.OzoneManagerServiceProviderImpl: Enqueued file: 000061.sst
   2025-03-26 13:08:39 2025-03-26 07:38:39,522 [pool-33-thread-3] INFO 
impl.OzoneManagerServiceProviderImpl: Dequeued file: 000061.sst
   2025-03-26 13:08:39 2025-03-26 07:38:39,522 [Recon-SyncOM-0] INFO 
impl.OzoneManagerServiceProviderImpl: Enqueued file: 000059.sst
   2025-03-26 13:08:39 2025-03-26 07:38:39,523 [pool-33-thread-4] INFO 
impl.OzoneManagerServiceProviderImpl: Dequeued file: 000059.sst
   2025-03-26 13:08:39 2025-03-26 07:38:39,523 [Recon-SyncOM-0] INFO 
impl.OzoneManagerServiceProviderImpl: Enqueued file: 000056.sst
   2025-03-26 13:08:39 2025-03-26 07:38:39,523 [pool-33-thread-1] INFO 
impl.OzoneManagerServiceProviderImpl: Dequeued file: 000056.sst
   2025-03-26 13:08:39 2025-03-26 07:38:39,524 [Recon-SyncOM-0] INFO 
impl.OzoneManagerServiceProviderImpl: Enqueued file: 000062.sst
   2025-03-26 13:08:39 2025-03-26 07:38:39,524 [pool-33-thread-2] INFO 
impl.OzoneManagerServiceProviderImpl: Dequeued file: 000062.sst
   2025-03-26 13:08:39 2025-03-26 07:38:39,524 [Recon-SyncOM-0] INFO 
impl.OzoneManagerServiceProviderImpl: Enqueued file: 000053.sst
   2025-03-26 13:08:39 2025-03-26 07:38:39,524 [pool-33-thread-3] INFO 
impl.OzoneManagerServiceProviderImpl: Dequeued file: 000053.sst
   2025-03-26 13:08:39 2025-03-26 07:38:39,524 [Recon-SyncOM-0] INFO 
impl.OzoneManagerServiceProviderImpl: Enqueued file: 000064.sst
   2025-03-26 13:08:39 2025-03-26 07:38:39,524 [pool-33-thread-4] INFO 
impl.OzoneManagerServiceProviderImpl: Dequeued file: 000064.sst
   2025-03-26 13:08:39 2025-03-26 07:38:39,525 [Recon-SyncOM-0] INFO 
impl.OzoneManagerServiceProviderImpl: Enqueued file: 000054.sst
   2025-03-26 13:08:39 2025-03-26 07:38:39,525 [pool-33-thread-1] INFO 
impl.OzoneManagerServiceProviderImpl: Dequeued file: 000054.sst
   2025-03-26 13:08:39 2025-03-26 07:38:39,525 [Recon-SyncOM-0] INFO 
impl.OzoneManagerServiceProviderImpl: Enqueued file: CURRENT
   2025-03-26 13:08:39 2025-03-26 07:38:39,525 [pool-33-thread-2] INFO 
impl.OzoneManagerServiceProviderImpl: Dequeued file: CURRENT
   2025-03-26 13:08:39 2025-03-26 07:38:39,525 [Recon-SyncOM-0] INFO 
impl.OzoneManagerServiceProviderImpl: Enqueued file: MANIFEST-000005
   2025-03-26 13:08:39 2025-03-26 07:38:39,525 [pool-33-thread-3] INFO 
impl.OzoneManagerServiceProviderImpl: Dequeued file: MANIFEST-000005
   2025-03-26 13:08:39 2025-03-26 07:38:39,526 [Recon-SyncOM-0] INFO 
impl.OzoneManagerServiceProviderImpl: Enqueued file: 000060.sst
   2025-03-26 13:08:39 2025-03-26 07:38:39,526 [pool-33-thread-4] INFO 
impl.OzoneManagerServiceProviderImpl: Dequeued file: 000060.sst
   2025-03-26 13:08:39 2025-03-26 07:38:39,526 [Recon-SyncOM-0] INFO 
impl.OzoneManagerServiceProviderImpl: Enqueued file: 000058.sst
   2025-03-26 13:08:39 2025-03-26 07:38:39,526 [pool-33-thread-1] INFO 
impl.OzoneManagerServiceProviderImpl: Dequeued file: 000058.sst
   2025-03-26 13:08:39 2025-03-26 07:38:39,528 [Recon-SyncOM-0] INFO 
impl.OzoneManagerServiceProviderImpl: Enqueued file: OPTIONS-000051
   2025-03-26 13:08:39 2025-03-26 07:38:39,528 [pool-33-thread-2] INFO 
impl.OzoneManagerServiceProviderImpl: Dequeued file: OPTIONS-000051
   2025-03-26 13:08:39 2025-03-26 07:38:39,529 [Recon-SyncOM-0] INFO 
impl.OzoneManagerServiceProviderImpl: Enqueued file: 
OZONE_RATIS_SNAPSHOT_COMPLETE
   2025-03-26 13:08:39 2025-03-26 07:38:39,529 [pool-33-thread-3] INFO 
impl.OzoneManagerServiceProviderImpl: Dequeued file: 
OZONE_RATIS_SNAPSHOT_COMPLETE
   2025-03-26 13:08:39 2025-03-26 07:38:39,529 [pool-33-thread-4] INFO 
impl.OzoneManagerServiceProviderImpl: Dequeued file: null
   2025-03-26 13:08:39 2025-03-26 07:38:39,529 [pool-33-thread-1] INFO 
impl.OzoneManagerServiceProviderImpl: Dequeued file: null
   2025-03-26 13:08:39 2025-03-26 07:38:39,529 [pool-33-thread-3] INFO 
impl.OzoneManagerServiceProviderImpl: Dequeued file: null
   2025-03-26 13:08:39 2025-03-26 07:38:39,529 [pool-33-thread-2] INFO 
impl.OzoneManagerServiceProviderImpl: Dequeued file: null
   2025-03-26 13:08:39 2025-03-26 07:38:39,529 [Recon-SyncOM-0] INFO 
recon.ReconContext: Update healthStatus of Recon from true to true.
   2025-03-26 13:08:39 2025-03-26 07:38:39,530 [Recon-SyncOM-0] INFO 
impl.OzoneManagerServiceProviderImpl: Attempting to update Recon OM DB with new 
snapshot located at: /data/metadata/om.snapshot.db_1742974719236
   2025-03-26 13:08:39 2025-03-26 07:38:39,576 [Recon-SyncOM-0] INFO 
recovery.ReconOmMetadataManagerImpl: Created OM DB handle from snapshot at 
/data/metadata/om.snapshot.db_1742974719236.
   2025-03-26 13:08:39 2025-03-26 07:38:39,586 [Recon-SyncOM-0] INFO 
impl.OzoneManagerServiceProviderImpl: Successfully updated Recon OM DB with new 
snapshot.
   2025-03-26 13:08:39 2025-03-26 07:38:39,589 [Recon-SyncOM-0] INFO 
impl.OzoneManagerServiceProviderImpl: Calling reprocess on Recon tasks.
   2025-03-26 13:08:39 2025-03-26 07:38:39,609 [ReconTaskThread-0] INFO 
impl.ReconContainerMetadataManagerImpl: KEY_CONTAINER Table is empty, 
initializing from CONTAINER_KEY Table ...
   2025-03-26 13:08:39 2025-03-26 07:38:39,609 [ReconTaskThread-0] INFO 
impl.ReconContainerMetadataManagerImpl: It took 0.0 seconds to initialized 0 
records to KEY_CONTAINER table
   2025-03-26 13:08:39 2025-03-26 07:38:39,737 [ReconTaskThread-0] INFO 
tasks.FileSizeCountTaskHelper: Starting Reprocess for FileSizeCountTaskOBS
   2025-03-26 13:08:39 2025-03-26 07:38:39,745 [ReconTaskThread-0] INFO 
tasks.FileSizeCountTaskHelper: Deleted 5 records from "FILE_COUNT_BY_SIZE"
   2025-03-26 13:08:39 2025-03-26 07:38:39,747 [ReconTaskThread-0] INFO 
tasks.FileSizeCountTaskHelper: Reprocessed 18 keys for bucket layout 
OBJECT_STORE.
   2025-03-26 13:08:39 2025-03-26 07:38:39,773 [ReconTaskThread-0] INFO 
tasks.FileSizeCountTaskHelper: FileSizeCountTaskOBS completed Reprocess in 36 
ms.
   2025-03-26 13:08:39 2025-03-26 07:38:39,776 [ReconTaskThread-0] INFO 
tasks.FileSizeCountTaskHelper: Starting Reprocess for FileSizeCountTaskFSO
   2025-03-26 13:08:39 2025-03-26 07:38:39,776 [ReconTaskThread-0] INFO 
tasks.FileSizeCountTaskHelper: Table already truncated by another task; waiting 
for truncation to complete.
   2025-03-26 13:08:39 2025-03-26 07:38:39,777 [ReconTaskThread-0] INFO 
tasks.FileSizeCountTaskHelper: Reprocessed 6 keys for bucket layout 
FILE_SYSTEM_OPTIMIZED.
   2025-03-26 13:08:39 2025-03-26 07:38:39,799 [ReconTaskThread-0] INFO 
tasks.FileSizeCountTaskHelper: FileSizeCountTaskFSO completed Reprocess in 23 
ms.
   2025-03-26 13:08:39 2025-03-26 07:38:39,802 [Recon-SyncOM-0] INFO 
recon.ReconContext: Update healthStatus of Recon from true to true.
   2025-03-26 13:08:39 2025-03-26 07:38:39,802 [Recon-SyncOM-0] INFO 
impl.OzoneManagerServiceProviderImpl: Sequence number after sync: 301
   2025-03-26 13:09:10 2025-03-26 07:39:10,214 [qtp1063823352-189] WARN 
conf.TimeDurationUtil: No unit for hdds.scmclient.rpc.timeout(60000) assuming 
MILLISECONDS
   2025-03-26 13:09:10 2025-03-26 07:39:10,214 [qtp1063823352-189] WARN 
conf.TimeDurationUtil: No unit for hdds.scmclient.max.retry.timeout(6000) 
assuming MILLISECONDS
   2025-03-26 13:09:10 2025-03-26 07:39:10,214 [qtp1063823352-189] INFO 
proxy.SCMContainerLocationFailoverProxyProvider: Created fail-over proxy for 
protocol StorageContainerLocationProtocolPB with 1 nodes: 
[nodeId=scmNodeId,nodeAddress=scm/172.18.0.3:9860]
   2025-03-26 13:09:39 2025-03-26 07:39:39,802 [Recon-SyncOM-0] INFO 
impl.OzoneManagerServiceProviderImpl: Last known sequence number before sync: 
301
   2025-03-26 13:09:39 2025-03-26 07:39:39,804 [Recon-SyncOM-0] INFO 
impl.OzoneManagerServiceProviderImpl: Seq number of Recon's OM DB : 301
   2025-03-26 13:09:39 2025-03-26 07:39:39,824 [Recon-SyncOM-0] INFO 
impl.OzoneManagerServiceProviderImpl: From Sequence Number:301, Recon DB 
Sequence Number: 301, Number of updates received from OM : 0, SequenceNumber 
diff: 0, SequenceNumber Lag from OM 0, isDBUpdateSuccess: true
   2025-03-26 13:09:39 2025-03-26 07:39:39,829 [Recon-SyncOM-0] INFO 
impl.OzoneManagerServiceProviderImpl: Delta updates received from OM : 1 loops, 
0 records
   2025-03-26 13:09:39 2025-03-26 07:39:39,829 [Recon-SyncOM-0] INFO 
impl.OzoneManagerServiceProviderImpl: Sequence number after sync: 301
   ```
   ![image.png](attachment:bd06ccd8-456a-4037-b0fa-035b86daab22:image.png)
   
   ![image.png](attachment:1ea0e09f-0bdd-49a5-8a3b-a2e666c3d057:image.png)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org
For additional commands, e-mail: issues-h...@ozone.apache.org

Reply via email to