[GitHub] [arrow] isaacbrodsky commented on a diff in pull request #13076: ARROW-8674: [JS] Implement IPC RecordBatch body buffer compression

GitBox Mon, 09 May 2022 15:57:50 -0700


isaacbrodsky commented on code in PR #13076:
URL: https://github.com/apache/arrow/pull/13076#discussion_r868559771



##########
js/package.json:
##########
@@ -99,6 +99,7 @@
     "jest": "27.5.1",
     "jest-silent-reporter": "0.5.0",
     "lerna": "4.0.0",
+    "lz4js": "0.2.0",

Review Comment:
   This is intended to only be registered in tests, and users need to declare 
the dependency and register in their own applications if they want?
   
   We should include code snippets and instructions for how to do that for each 
codec needed to support the full Arrow spec. The error message could even link 
to the docs on exactly how to enable a given compression codec.



##########
js/src/ipc/reader.ts:
##########
@@ -352,12 +353,24 @@ abstract class RecordBatchReaderImpl<T extends TypeMap = 
any> implements RecordB
         return this;
     }
 
-    protected _loadRecordBatch(header: metadata.RecordBatch, body: any) {
+    protected _loadRecordBatch(header: metadata.RecordBatch, body: Uint8Array) 
{
+        if (header.compression) {
+            const codec = compressionRegistry.get(header.compression);
+            if (codec?.decode) {
+                // TODO: does this need to be offset by 8 bytes? Since the 
uncompressed length is
+                // written in the first 8 bytes:
+                // 
https://github.com/apache/arrow/blob/1fc251f18d5b48f0c9fe8af8168237e7e6d05a45/format/Message.fbs#L59-L65

Review Comment:
   Confusing that here the uint8array is the "body" but it might correspond to 
the "buffer" in that spec. I'm not sure off hand which is actually here?



##########
js/src/ipc/reader.ts:
##########
@@ -352,12 +353,24 @@ abstract class RecordBatchReaderImpl<T extends TypeMap = 
any> implements RecordB
         return this;
     }
 
-    protected _loadRecordBatch(header: metadata.RecordBatch, body: any) {
+    protected _loadRecordBatch(header: metadata.RecordBatch, body: Uint8Array) 
{
+        if (header.compression) {
+            const codec = compressionRegistry.get(header.compression);
+            if (codec?.decode) {
+                // TODO: does this need to be offset by 8 bytes? Since the 
uncompressed length is
+                // written in the first 8 bytes:
+                // 
https://github.com/apache/arrow/blob/1fc251f18d5b48f0c9fe8af8168237e7e6d05a45/format/Message.fbs#L59-L65
+                body = codec.decode(body);
+            } else {
+                throw new Error('Record batch is compressed but codec not 
found');

Review Comment:
   Should this error include the codec ID?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] isaacbrodsky commented on a diff in pull request #13076: ARROW-8674: [JS] Implement IPC RecordBatch body buffer compression

Reply via email to