Dear Wes,

I am responding to your offer to discuss my post on Stackoverflow at: 
https://stackoverflow.com/questions/59432045/pandas-dataframe-slower-to-read-from-parquet-than-from-pickle-file?noredirect=1#comment105050134_59432045

You have explained in the comment section that the kind of dataset I was 
manipulating is not ideal for Parquet. I would be happy to know more about this.

Here is more context:
I am working at NASA Goddard Space Flight Center on data from the Solar 
Dynamics Observatory. It sends 1.5 TB of data per day of observations of the 
Sun. The dataset I am currently working, and described in my SO post are a 
subset of the metadata associated with those observations.

I got interested in using Parquet as I got an error using HDF5. My too big 
dataset resulted simply in an error, whereas a smaller version of it had no 
problem. Parquet was working regardless of the size of the dataset. Also, I am 
going to use dask and dask withing GPUs ( from Nvidia RAPIDS) which I believe 
support Parquet. This is what got me interested in using this format.

Thanks

Raphael Attie


- - - - - - - - - -- - - - -- - - - -- - - - -- - - - -- -
Dr. Raphael AttiƩ (GSFC-6710)
NASA / Goddard Space Flight Center
George Mason University
Office (NASA GSFC, room 041): 301-286-0360
Cell: 301-631-4954
Email (1): [email protected]<mailto:Email%20(1):%[email protected]>
Email (2): [email protected]<mailto:[email protected]>

Reply via email to