JingsongLi commented on code in PR #8219:
URL: https://github.com/apache/paimon/pull/8219#discussion_r3408926391


##########
paimon-format/src/main/java/org/apache/paimon/format/blob/BlobFormatWriter.java:
##########
@@ -97,21 +114,35 @@ public void addElement(InternalRow element) throws 
IOException {
         }
 
         long previousPos = out.getPos();
-        crc32.reset();
-
-        write(MAGIC_NUMBER_BYTES);
-
-        long blobPos = out.getPos();
+        ByteArrayOutputStream stagedPayload = new ByteArrayOutputStream();

Review Comment:
   Could we avoid staging the entire payload in memory here? This writer is 
used for normal blob writes too, so the new `ByteArrayOutputStream` + 
`toByteArray()` path can allocate roughly 2x the blob size before writing. 
Since BLOBs are expected to be large and `blob.target-file-size` can be much 
larger than task memory, a valid large blob can now OOM even when 
`writeNullOnMissingFile` is false. One safer approach is to open the source 
stream before appending anything to `out`; if opening hits 404, write NULL, 
otherwise keep the previous streaming copy path.



##########
paimon-api/src/main/java/org/apache/paimon/rest/HttpClientUtils.java:
##########
@@ -86,9 +88,61 @@ private static HttpClientConnectionManager 
configureConnectionManager() {
     public static InputStream getAsInputStream(String uri) throws IOException {
         HttpGet httpGet = new HttpGet(uri);
         CloseableHttpResponse response = DEFAULT_HTTP_CLIENT.execute(httpGet);
-        if (response.getCode() != 200) {
-            throw new RuntimeException("HTTP error code: " + 
response.getCode());
+        int statusCode = response.getCode();
+        if (statusCode != HttpStatus.SC_OK) {
+            throw httpError(statusCode);
         }
         return response.getEntity().getContent();
     }
+
+    /**
+     * Checks whether an HTTP resource exists. HEAD is attempted first; when 
the server does not
+     * support HEAD, a lightweight GET with {@code Range: bytes=0-0} is used 
as fallback.
+     */
+    public static boolean exists(String uri) throws IOException {
+        int statusCode = headStatusCode(uri);

Review Comment:
   Is it safe to make the HEAD result authoritative here? Some HTTP resources, 
including signed or GET-only URLs, can reject HEAD or return a different status 
than GET even though the payload is readable. With this change, a HEAD 404/403 
would make `exists()` return false or throw before trying the range GET, so an 
existing blob could be treated as missing or fail the write. Could we fall back 
to `Range: bytes=0-0` GET for non-success HEAD statuses as well, at least for 
403/404, and only return false when the GET also returns 404?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to