Dear Wes, I am responding to your offer to discuss my post on Stackoverflow at: https://stackoverflow.com/questions/59432045/pandas-dataframe-slower-to-read-from-parquet-than-from-pickle-file?noredirect=1#comment105050134_59432045
You have explained in the comment section that the kind of dataset I was manipulating is not ideal for Parquet. I would be happy to know more about this. Here is more context: I am working at NASA Goddard Space Flight Center on data from the Solar Dynamics Observatory. It sends 1.5 TB of data per day of observations of the Sun. The dataset I am currently working, and described in my SO post are a subset of the metadata associated with those observations. I got interested in using Parquet as I got an error using HDF5. My too big dataset resulted simply in an error, whereas a smaller version of it had no problem. Parquet was working regardless of the size of the dataset. Also, I am going to use dask and dask withing GPUs ( from Nvidia RAPIDS) which I believe support Parquet. This is what got me interested in using this format. Thanks Raphael Attie - - - - - - - - - -- - - - -- - - - -- - - - -- - - - -- - Dr. Raphael AttiƩ (GSFC-6710) NASA / Goddard Space Flight Center George Mason University Office (NASA GSFC, room 041): 301-286-0360 Cell: 301-631-4954 Email (1): [email protected]<mailto:Email%20(1):%[email protected]> Email (2): [email protected]<mailto:[email protected]>
