jineshparakh commented on PR #17713:
URL: https://github.com/apache/pinot/pull/17713#issuecomment-3912412713

   @shounakmk219 Thanks for the review!
   
   Good catch on the other `getBlob()` callers. I'd actually noticed the 
missing null checks in `lastModified(), touch(), open(), and copyFile()` as 
well, but deliberately kept this PR scoped to just `deleteBatch` to keep things 
focused.
   The reason is that those methods aren't as straightforward to fix. 
   
   The reason is that those methods aren't as straightforward to fix. They each 
have different return types and semantics. 
   
   For instance, the PinotFS contract specifies that `touch()` should create an 
empty file if it doesn't exist, but the GCS implementation currently doesn't do 
this, so the fix would involve implementing file creation, not just a null 
guard.
   In `copyFile()`, the destination blob is created before the source blob is 
read, so if the source is null we'd also need to clean up the partially created 
destination.
   And for `lastModified()`, we'd need to decide whether to return 0L (like 
`LocalPinotFS` does for missing files) or throw. Each one needs a closer look 
at the PinotFS contract and how other implementations handle it. Happy to 
tackle those in a follow-up PR.
   
   On the `existsBlob(Blob blob)` suggestion, it actually does more than a null 
check. It also calls `blob.exists()`, which is an additional call to GCS to 
verify the blob still exists on the server. For `deleteBatch`, a simple `blob 
!= null` should be sufficient because it avoids that extra network overhead, 
especially since this method processes batches of up to 100 blob IDs.
   
   Side note: `getUpdateTime()` used in `lastModified()` and `touch()` is also 
[deprecated](https://docs.cloud.google.com/java/docs/reference/google-cloud-storage/latest/com.google.cloud.storage.BlobInfo#com_google_cloud_storage_BlobInfo_getUpdateTime__)
 in the GCS SDK in favor of `getUpdateTimeOffsetDateTime()`. Could address that 
alongside the null-safety fixes in the follow-up.
   
   Please do let me know if you still feel that all these things should be 
handled in this PR itself.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to