shanjixi commented on issue #36608: URL: https://github.com/apache/arrow/issues/36608#issuecomment-1645368364
> supported TFRecord is just like a file contains multi-rows; In practice,we use a converter to transform TFRecord file (compressed) into arrow::Table Type. Then we apply Arrow filter/take/list_flatten functions before we send the Data to Tensorflow worker; Arrow is used to read data from HDFS, make further processing and avoid memroy copy cost between these two steps; What's more, within our company both Hadoop-ZSTD and Hadoop-SNAP are supported to be read with Arrow directly( we call them as zstd_decompress_inputstream and snappy_decompress_inputstream. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
