Hello , I am working on a proof-of-concept for which I am having a bit of trouble understanding apache-arrow with JS and wanted to clarify a few things with this regard.
My use case- I have a MEAN (MongoDB/Express/Angular/NodeJS) that connects to customer databases and third-party data and performs analytics and experimentations. In this regard I am looking at Apache arrow from interoperability angle and performant analytics angle. Right now I am working on the analytics side - From JS front end I need to be able to read parquet and big-data CSV files. In this regard please clarify my understanding : 1. I cannot read parquet file using arrow libraries directly (due to this <https://issues.apache.org/jira/browse/ARROW-2786> issue). I have to use something like parquetjs-lite <https://www.npmjs.com/package/parquetjs-lite> for this. 2. To read big-data CSV into apache-arrow, I have to first use Python (pyarrow) to convert CSV to arrow format (as in using-apache-arrow-js-with-large-datasets <https://observablehq.com/@theneuralbit/using-apache-arrow-js-with-large-datasets>) and then read the arrow file in my JS application. a). If (2) above is correct then can I convert any third-party CSV to arrow or should I have a predefined schema ahead of time ? b). Are nulls and NaNs allowed in the CSV . If the above understandings are right it seems rather a roundabout way (or is it just me) . Are there any other paths you can suggest ? regards, Thomas