[
https://issues.apache.org/jira/browse/ARROW-8674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17524365#comment-17524365
]
Kyle Barron commented on ARROW-8674:
------------------------------------
> Maybe. Let's make sure to compare native gzip compression that a web server
>uses with js lz4/zstd compression.
I'm most familiar with [fastapi|https://github.com/tiangolo/fastapi], which is
probably the third most-popular Python web server framework after Django and
Flask. Its suggested gzip middleware [uses the standard library's gzip
implementation|https://github.com/encode/starlette/blob/d7cbe2a4887ad6b15fe7523ed62e28a426b7697d/starlette/middleware/gzip.py#L37-L39]
so I don't think my example above was completely out of place. The [lzbench
native benchmarks|https://github.com/inikep/lzbench#benchmarks] still have lz4
and zstd as 4-6x faster than zlib.
But I think these performance discussions are more of a side discussion; given
that the Arrow IPC format allows for compression, I'd love to find a way for
Arrow JS to support these files.
> It would unfortunately also preclude people from putting decompression into a
> worker. Maybe we can make the relevant IPC methods return return promises
> when the compression/decompression method is async (returns a promise).
That's a very good point. If we implement a registry of some sort, we could
consider allowing both sync and async of compression. Then the
`RecordBatchReader` could use sync compression and `AsyncRecordBatchReader`
could use the async compression. So if the user wants to use de/compression on
a worker they would be able to use the AsyncRecordBatchReader. Not sure if
that's a great idea; but having a synchronous `tableFromIPC` option is nice.
> If they are small enough, I would consider including a default lz4
> implementation. Sounds good?
Sounds good! I'll try to find time soon to put up a draft.
> [JS] Implement IPC RecordBatch body buffer compression from ARROW-300
> ---------------------------------------------------------------------
>
> Key: ARROW-8674
> URL: https://issues.apache.org/jira/browse/ARROW-8674
> Project: Apache Arrow
> Issue Type: Sub-task
> Components: JavaScript
> Reporter: Wes McKinney
> Priority: Major
>
> This may not be a hard requirement for JS because this would require pulling
> in implementations of LZ4 and ZSTD which not all users may want
--
This message was sent by Atlassian Jira
(v8.20.7#820007)