Re: [Hdf-forum] CSV data into HDF5 data structure and files

nitin chandra Sun, 05 Feb 2017 09:12:12 -0800

Hi All,

Any solution would be helpful.


Thank you,

Nitin

On 2 February 2017 at 00:34, nitin chandra <nitinchand...@gmail.com> wrote:
> Hi Francesc,
>
> I tried your example as it is, could not get time to modify and try
> some thing new.
>
> ran the
>
> $ python csv_demo.py
>
> it did create a CSV file with 10 columns, populating the columns with random 
> no.
>
> The demo.h5 was created, and I used HDFView 2.9 to see the contents of
> the demo.h5 file.
>
> created were a directory table,
>
>  and data table - table.
>
>  In the data table - table, there are 2 columns
>
> index   |   value_block_0
>
> empty   | no value
> no data | but 10 commas
>
> So that I can relate to your guidance with respect to the issue,
> please find attached 2 sample files.
> Also, note the first row in CSVs attached, this was created to
> initialise the start point of data sequence. Will it be a good
> practice to have them in h5 tables also ? Last column has string
> values, need them.
>
> ALIGN data goes into file1 and GRADE data into File2, so I am looking
> for a write function to write into respective tables and then read
> function to read from them.
>
> After the data is in H5 file, can I insert/add/append a new row in
> between other rows or at end of file ? Which editor to use or method
> to do it in ?
>
> Thank you,
>
> Nitin
>
> On 30 January 2017 at 23:01, nitin chandra <nitinchand...@gmail.com> wrote:
>> Thank you Francesc,
>>
>> Please give me 2-3 days try your example ... do some reading and
>> testes based as per the link mentioned.
>>
>> I shall repost soon.
>>
>> Thank you
>>
>> Nitin
>>
>> On 30 January 2017 at 17:14, Francesc Altet <fal...@hdfgroup.org> wrote:
>>> Hi Nitin,
>>>
>>>
>>> I think before getting into details, you need to look into how to
>>> efficiently read and write data from CSV files into HDF5 in Python.  For
>>> this, pandas is a great library to use.  My advice is to have a look at the
>>> excellent documentation in pandas website:
>>>
>>>
>>> http://pandas.pydata.org/pandas-docs/stable/io.html
>>>
>>>
>>> In particular, you want to use the `pandas.read_csv()` which one of the
>>> fastest ways to read CSV files that I am aware of.  Also, for storing the
>>> data in HDF5, `pandas.HDFStore()` comes handy because it can generate HDF5
>>> files out of pandas Dataframes.  In addition, in order to avoid loading all
>>> the data in a Dataframe in memory, you want to use the `chunksize` keyword
>>> that will allow to read the CSV files in chunks before storing.
>>>
>>>
>>> I have prepared an example for you (attached) so that you can have a look at
>>> how to use all of this (it is simpler than it may seem).  Here it is the
>>> output on my machine:
>>>
>>>
>>> $ python csv_demo.py
>>> CSV creation time: 1.491 (67.092 Krow/s)
>>> CSV reading time: 0.134 (748.360 Krow/s)
>>> HDF5 store time: 0.322 (310.228 Krow/s)
>>> HDF5 read time: 0.006 (15622.990 Krow/s)
>>>
>>>
>>> so, once the data is stored in HDF5, the read times will be much faster than
>>> using CSV (as expected).
>>>
>>>
>>> HTH,
>>>
>>>
>>> Francesc
>>>
>>>
>>> _______________________________________________
>>> Hdf-forum is for HDF software users discussion.
>>> Hdf-forum@lists.hdfgroup.org
>>> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>>> Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Re: [Hdf-forum] CSV data into HDF5 data structure and files

Reply via email to