jineshparakh opened a new issue, #17714:
URL: https://github.com/apache/pinot/issues/17714

   While working on #17713, I identified several additional gaps in GcsPinotFS 
that should be tracked and fixed. Grouping them here for visibility.
   
   ## 1. Missing null checks on `getBlob()` callers
   
   `GcsPinotFS.getBlob()` wraps `Storage.get(BlobId)` which [returns 
null](https://docs.cloud.google.com/java/docs/reference/google-cloud-storage/latest/com.google.cloud.storage.Storage#com_google_cloud_storage_Storage_get_com_google_cloud_storage_BlobId_)
 when the blob does not exist. Several callers do not handle this, leading to 
potential NPEs
   
   Method | Unguarded call | Impact
   -- | -- | --
   `lastModified()` | `getBlob(new GcsUri(uri)).getUpdateTime()` | NPE if blob 
doesn't exist
   `touch()`  | `blob.getUpdateTime(), blob.toBuilder()` | NPE if blob doesn't 
exist
   `touch() ` | `getBlob(gcsUri).getUpdateTime()` | NPE if blob deleted between 
update and re-fetch
   `open()`  | `blob.reader()` | NPE if blob doesn't exist
   `copyFile()` | `blob.copyTo(...)` | NPE on source blob; also leaves an empty 
destination blob (created on line 482) that needs cleanup
   
   For comparison, the S3 SDK throws `NoSuchKeyException` for missing objects, 
so `S3PinotFS` gets meaningful errors without explicit null checks. The GCS SDK 
returns null silently, so `GcsPinotFS` needs to handle it explicitly.
   Methods that already handle null correctly:
   - `delete()`: `return blob != null && blob.delete()`
   - `deleteBatch()` (fixed in #17713)
   - `length()`: uses `checkState(existsBlob(blob), ...)`
   - `copyToLocalFile()`: uses `checkState(existsBlob(blob), ...)`
   
   ## 2. PinotFS contract violations
   
   ### `touch()` does not create files
   The PinotFS interface specifies:
   
   > Updates the last modified time of an existing file or directory to be 
current time. If the file system object does not exist, creates an empty file.
   
   `LocalPinotFS` and `S3PinotFS` both correctly implement this. `S3PinotFS` 
catches `NoSuchKeyException` and creates an empty object. `LocalPinotFS` checks 
`!file.exists()` and calls `file.createNewFile()`.
   `GcsPinotFS.touch()` does not handle missing files at all. It would NPE 
instead of creating the file.
   
   ### `lastModified()` return value for missing files
   The PinotFS interface specifies:
   > Returns the time the file was last modified  or 0L if the file does not 
exist or if an I/O error occurs
   
   `LocalPinotFS.lastModified(` returns `file.lastModified()` which returns 0L 
for non-existent files. `GcsPinotFS` would NPE instead.
   
   ## 3. Deprecated getUpdateTime() usage
   `BlobInfo.getUpdateTime()` is deprecated in the GCS SDK in favor of 
`getUpdateTimeOffsetDateTime()`. It is used in four places:
   `lastModified()`
   `touch()` (twice)
   `listFilesWithMetadata()`
   
   
   ## 4. Minor: redundant getBlob call in `delete()`
   The private `delete()` method calls `exists(segmentUri)`, which internally 
calls `getBlob()` for file URIs. Then it calls `getBlob(segmentUri)` again. 
This results in two GCS round-trips for a single file deletion. Not a bug but 
an efficiency concern for high-throughput deletion.
   
   
   We can have separate PRs for every sub-issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to