[ 
https://issues.apache.org/jira/browse/ARROW-8674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524365#comment-17524365
 ] 

Kyle Barron commented on ARROW-8674:
------------------------------------

> Maybe. Let's make sure to compare native gzip compression that a web server 
>uses with js lz4/zstd compression.

I'm most familiar with [fastapi|https://github.com/tiangolo/fastapi], which is 
probably the third most-popular Python web server framework after Django and 
Flask. Its suggested gzip middleware [uses the standard library's gzip 
implementation|https://github.com/encode/starlette/blob/d7cbe2a4887ad6b15fe7523ed62e28a426b7697d/starlette/middleware/gzip.py#L37-L39]
 so I don't think my example above was completely out of place. The [lzbench 
native benchmarks|https://github.com/inikep/lzbench#benchmarks] still have lz4 
and zstd as 4-6x faster than zlib.

But I think these performance discussions are more of a side discussion; given 
that the Arrow IPC format allows for compression, I'd love to find a way for 
Arrow JS to support these files.

> It would unfortunately also preclude people from putting decompression into a 
> worker. Maybe we can make the relevant IPC methods return return promises 
> when the compression/decompression method is async (returns a promise).

That's a very good point. If we implement a registry of some sort, we could 
consider allowing both sync and async of  compression. Then the 
`RecordBatchReader` could use sync compression and `AsyncRecordBatchReader` 
could use the async compression. So if the user wants to use de/compression on 
a worker they would be able to use the AsyncRecordBatchReader. Not sure if 
that's a great idea; but having a synchronous `tableFromIPC` option is nice.

> If they are small enough, I would consider including a default lz4 
> implementation. Sounds good?

Sounds good! I'll try to find time soon to put up a draft.

> [JS] Implement IPC RecordBatch body buffer compression from ARROW-300
> ---------------------------------------------------------------------
>
>                 Key: ARROW-8674
>                 URL: https://issues.apache.org/jira/browse/ARROW-8674
>             Project: Apache Arrow
>          Issue Type: Sub-task
>          Components: JavaScript
>            Reporter: Wes McKinney
>            Priority: Major
>
> This may not be a hard requirement for JS because this would require pulling 
> in implementations of LZ4 and ZSTD which not all users may want



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to