Hello ,

     I am working on a proof-of-concept for which I am having a bit of
trouble understanding apache-arrow with JS and wanted to clarify a few
things with this regard.

My use case-
       I have a MEAN (MongoDB/Express/Angular/NodeJS) that connects to
customer databases and third-party data and performs analytics and
experimentations. In this regard I am looking at Apache arrow from
interoperability angle and performant analytics angle.

Right now I am working on the analytics side - From JS front end I need to
be able to read parquet and big-data CSV files. In this regard please
clarify my understanding :

1. I cannot read parquet file using arrow libraries directly (due to this
<https://issues.apache.org/jira/browse/ARROW-2786> issue). I have to use
something like parquetjs-lite
<https://www.npmjs.com/package/parquetjs-lite> for
this.
2. To read big-data CSV into apache-arrow, I have to first use Python
(pyarrow) to convert CSV to arrow format (as in
using-apache-arrow-js-with-large-datasets
<https://observablehq.com/@theneuralbit/using-apache-arrow-js-with-large-datasets>)
and then read the arrow file in my JS application.
      a). If (2)  above is correct then can I convert any third-party CSV
to arrow or should I have a predefined schema ahead of time ?
      b). Are nulls and NaNs allowed in the CSV .

If the above understandings are right it seems rather a roundabout way (or
is it just me) . Are there any other paths you can suggest ?

regards,
Thomas

Reply via email to