[PR] Forward SegmentDirectoryLoaderContext in DefaultSegmentDirectoryLoader [pinot]

via GitHub Fri, 26 Jun 2026 15:18:22 -0700


raghavyadav01 opened a new pull request, #18863:
URL: https://github.com/apache/pinot/pull/18863


   ## What
   
   [#18806](https://github.com/apache/pinot/pull/18806) ("Replace 
`segmentDirectoryConfigs` map with `ReadMode` in 
`SegmentDirectoryLoaderContext`") refactored 
`DefaultSegmentDirectoryLoader.load` and inadvertently dropped the 
`segmentLoaderContext` argument when constructing `SegmentLocalFSDirectory`:
   
   ```java
   // Before #18806
   return new SegmentLocalFSDirectory(directory, segmentLoaderContext,
       
ReadMode.valueOf(segmentDirectoryConfigs.getProperty(IndexLoadingConfig.READ_MODE_KEY)));
   
   // After #18806 (current master)
   return new SegmentLocalFSDirectory(directory, 
segmentLoaderContext.getReadMode());
   ```
   
   The 2-arg `SegmentLocalFSDirectory(File, ReadMode)` constructor sets the 
underlying `SingleFileIndexDirectory#_segmentDirectoryLoaderContext` to `null`. 
This silently breaks the task-config propagation introduced in 
[#18264](https://github.com/apache/pinot/pull/18264) — 
`SingleFileIndexDirectory#serializeTaskConfigToJsonFromContext()` 
short-circuits at its first guard:
   
   ```java
   private String serializeTaskConfigToJsonFromContext() {
     if (_segmentDirectoryLoaderContext == null) {
       return null;     // ← #18806 makes this always true on the default 
loader path
     }
     ...
   }
   ```
   
   As a result, `task.config.json` is never injected into 
`EmptyIndexBuffer.properties`, and downstream consumers that depend on the 
table's task configuration to resolve their own configuration (for example, 
storage credentials for remote forward-index reads at preprocess time) silently 
fall back to ambient defaults (environment variables / IAM).
   
   ## Fix
   
   Restore the prior behavior by constructing `SegmentMetadataImpl` from the 
index directory and using the existing 4-arg `SegmentLocalFSDirectory(File, 
SegmentMetadataImpl, ReadMode, @Nullable SegmentDirectoryLoaderContext)` 
constructor. One file, +10/−2, no new constructor or API surface.
   
   ## How tested
   
   Validated end-to-end against a local cluster with isolated S3 backends 
(different access keys for source data vs the Pinot tier, so the 
environment-fallback path lands on the wrong endpoint and is observable):
   
   - **Without the fix** (current master): segments with a preprocess-time 
index update on a remote source column (e.g. a JSON index on a `JSON` column, a 
TEXT index on a `STRING` column) go to ERROR — `NoSuchBucket` because the 
parquet reader falls back to environment credentials at the global SDK endpoint 
instead of the per-source endpoint encoded in the task config.
   - **With the fix**: same setup, same segments load ONLINE; 
`JsonIndexHandler`, `TextIndexHandler`, and `ParquetNullValueIndexHandler` 
resolve credentials from the propagated task config and read source data 
successfully.
   
   Pre-fix stack signature:
   
   ```
   SegmentOnlineOfflineStateModel.onBecomeOnlineFromOffline
     → BaseTableDataManager.downloadAndLoadSegment
     → ImmutableSegmentLoader.preprocessSegment
     → SegmentPreProcessor.process
     → {Json,Text,...}IndexHandler.updateIndices
     → {Struct,...}ParquetForwardIndexReader{V2,V6}
     → PageReader.readChunkForIndexing
     → software.amazon.awssdk.services.s3.model.NoSuchBucketException
   ```
   
   ## Suggested labels
   
   `bugfix`, `release-notes` (regression in `1.6.0-SNAPSHOT` between #18264 and 
#18806).
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] Forward SegmentDirectoryLoaderContext in DefaultSegmentDirectoryLoader [pinot]

Reply via email to