jineshparakh opened a new issue, #17714: URL: https://github.com/apache/pinot/issues/17714
While working on #17713, I identified several additional gaps in GcsPinotFS that should be tracked and fixed. Grouping them here for visibility. ## 1. Missing null checks on `getBlob()` callers `GcsPinotFS.getBlob()` wraps `Storage.get(BlobId)` which [returns null](https://docs.cloud.google.com/java/docs/reference/google-cloud-storage/latest/com.google.cloud.storage.Storage#com_google_cloud_storage_Storage_get_com_google_cloud_storage_BlobId_) when the blob does not exist. Several callers do not handle this, leading to potential NPEs Method | Unguarded call | Impact -- | -- | -- `lastModified()` | `getBlob(new GcsUri(uri)).getUpdateTime()` | NPE if blob doesn't exist `touch()` | `blob.getUpdateTime(), blob.toBuilder()` | NPE if blob doesn't exist `touch() ` | `getBlob(gcsUri).getUpdateTime()` | NPE if blob deleted between update and re-fetch `open()` | `blob.reader()` | NPE if blob doesn't exist `copyFile()` | `blob.copyTo(...)` | NPE on source blob; also leaves an empty destination blob (created on line 482) that needs cleanup For comparison, the S3 SDK throws `NoSuchKeyException` for missing objects, so `S3PinotFS` gets meaningful errors without explicit null checks. The GCS SDK returns null silently, so `GcsPinotFS` needs to handle it explicitly. Methods that already handle null correctly: - `delete()`: `return blob != null && blob.delete()` - `deleteBatch()` (fixed in #17713) - `length()`: uses `checkState(existsBlob(blob), ...)` - `copyToLocalFile()`: uses `checkState(existsBlob(blob), ...)` ## 2. PinotFS contract violations ### `touch()` does not create files The PinotFS interface specifies: > Updates the last modified time of an existing file or directory to be current time. If the file system object does not exist, creates an empty file. `LocalPinotFS` and `S3PinotFS` both correctly implement this. `S3PinotFS` catches `NoSuchKeyException` and creates an empty object. `LocalPinotFS` checks `!file.exists()` and calls `file.createNewFile()`. `GcsPinotFS.touch()` does not handle missing files at all. It would NPE instead of creating the file. ### `lastModified()` return value for missing files The PinotFS interface specifies: > Returns the time the file was last modified or 0L if the file does not exist or if an I/O error occurs `LocalPinotFS.lastModified(` returns `file.lastModified()` which returns 0L for non-existent files. `GcsPinotFS` would NPE instead. ## 3. Deprecated getUpdateTime() usage `BlobInfo.getUpdateTime()` is deprecated in the GCS SDK in favor of `getUpdateTimeOffsetDateTime()`. It is used in four places: `lastModified()` `touch()` (twice) `listFilesWithMetadata()` ## 4. Minor: redundant getBlob call in `delete()` The private `delete()` method calls `exists(segmentUri)`, which internally calls `getBlob()` for file URIs. Then it calls `getBlob(segmentUri)` again. This results in two GCS round-trips for a single file deletion. Not a bug but an efficiency concern for high-throughput deletion. We can have separate PRs for every sub-issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
