Hi All, Any solution would be helpful.
Thank you, Nitin On 2 February 2017 at 00:34, nitin chandra <nitinchand...@gmail.com> wrote: > Hi Francesc, > > I tried your example as it is, could not get time to modify and try > some thing new. > > ran the > > $ python csv_demo.py > > it did create a CSV file with 10 columns, populating the columns with random > no. > > The demo.h5 was created, and I used HDFView 2.9 to see the contents of > the demo.h5 file. > > created were a directory table, > > and data table - table. > > In the data table - table, there are 2 columns > > index | value_block_0 > > empty | no value > no data | but 10 commas > > So that I can relate to your guidance with respect to the issue, > please find attached 2 sample files. > Also, note the first row in CSVs attached, this was created to > initialise the start point of data sequence. Will it be a good > practice to have them in h5 tables also ? Last column has string > values, need them. > > ALIGN data goes into file1 and GRADE data into File2, so I am looking > for a write function to write into respective tables and then read > function to read from them. > > After the data is in H5 file, can I insert/add/append a new row in > between other rows or at end of file ? Which editor to use or method > to do it in ? > > Thank you, > > Nitin > > On 30 January 2017 at 23:01, nitin chandra <nitinchand...@gmail.com> wrote: >> Thank you Francesc, >> >> Please give me 2-3 days try your example ... do some reading and >> testes based as per the link mentioned. >> >> I shall repost soon. >> >> Thank you >> >> Nitin >> >> On 30 January 2017 at 17:14, Francesc Altet <fal...@hdfgroup.org> wrote: >>> Hi Nitin, >>> >>> >>> I think before getting into details, you need to look into how to >>> efficiently read and write data from CSV files into HDF5 in Python. For >>> this, pandas is a great library to use. My advice is to have a look at the >>> excellent documentation in pandas website: >>> >>> >>> http://pandas.pydata.org/pandas-docs/stable/io.html >>> >>> >>> In particular, you want to use the `pandas.read_csv()` which one of the >>> fastest ways to read CSV files that I am aware of. Also, for storing the >>> data in HDF5, `pandas.HDFStore()` comes handy because it can generate HDF5 >>> files out of pandas Dataframes. In addition, in order to avoid loading all >>> the data in a Dataframe in memory, you want to use the `chunksize` keyword >>> that will allow to read the CSV files in chunks before storing. >>> >>> >>> I have prepared an example for you (attached) so that you can have a look at >>> how to use all of this (it is simpler than it may seem). Here it is the >>> output on my machine: >>> >>> >>> $ python csv_demo.py >>> CSV creation time: 1.491 (67.092 Krow/s) >>> CSV reading time: 0.134 (748.360 Krow/s) >>> HDF5 store time: 0.322 (310.228 Krow/s) >>> HDF5 read time: 0.006 (15622.990 Krow/s) >>> >>> >>> so, once the data is stored in HDF5, the read times will be much faster than >>> using CSV (as expected). >>> >>> >>> HTH, >>> >>> >>> Francesc >>> >>> >>> _______________________________________________ >>> Hdf-forum is for HDF software users discussion. >>> Hdf-forum@lists.hdfgroup.org >>> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org >>> Twitter: https://twitter.com/hdf5 _______________________________________________ Hdf-forum is for HDF software users discussion. Hdf-forum@lists.hdfgroup.org http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org Twitter: https://twitter.com/hdf5