Following up on Parquet file format issue posted on StackOverflow.

Attie, Raphael (GSFC-671.0)[GEORGE MASON UNIVERSITY] Tue, 28 Jan 2020 13:27:24 -0800

Dear Wes,

I am responding to your offer to discuss my post on Stackoverflow at: 
https://stackoverflow.com/questions/59432045/pandas-dataframe-slower-to-read-from-parquet-than-from-pickle-file?noredirect=1#comment105050134_59432045


You have explained in the comment section that the kind of dataset I was 
manipulating is not ideal for Parquet. I would be happy to know more about this.

Here is more context:
I am working at NASA Goddard Space Flight Center on data from the Solar 
Dynamics Observatory. It sends 1.5 TB of data per day of observations of the 
Sun. The dataset I am currently working, and described in my SO post are a 
subset of the metadata associated with those observations.

I got interested in using Parquet as I got an error using HDF5. My too big 
dataset resulted simply in an error, whereas a smaller version of it had no 
problem. Parquet was working regardless of the size of the dataset. Also, I am 
going to use dask and dask withing GPUs ( from Nvidia RAPIDS) which I believe 
support Parquet. This is what got me interested in using this format.

Thanks

Raphael Attie


- - - - - - - - - -- - - - -- - - - -- - - - -- - - - -- -
Dr. Raphael Attié (GSFC-6710)
NASA / Goddard Space Flight Center
George Mason University
Office (NASA GSFC, room 041): 301-286-0360
Cell: 301-631-4954
Email (1): [email protected]<mailto:Email%20(1):%[email protected]>
Email (2): [email protected]<mailto:[email protected]>

Following up on Parquet file format issue posted on StackOverflow.

Reply via email to