Hi Peter,
Thanks for the reply, sorry if I'm asking alot of questions, but I still
can't figure out how to connect the write of a dataset to a file.
We want to give the hdf file to a another program that does some analysis
for us. I guess my psudo code looks something like this.
Assume that the file has 25M lines
List lineEntry
int datsetSize = 25000000
int index = 0;
//Number of strings to write in one chunk
int maxRecords = 1500000;
for(String line in file F) {
lineEntry.add(line)
if(index % maxRecords == 0){
String [] valuesToWrite = lineEntry.toArray();
//Find out where in the dataset to start the writing from, using
hyperslab
int dataspace_id = H5.H5Dget_space(dataset_id);
H5.H5Sselect_all(dataspace_id);
//2'end iteration the start index for writing is 1500000*2
//So the writing will start from 3 M -> 4.5M
H5
.H5Sselect_hyperslab(dataspace_id,
HDF5Constants.H5S_SELECT_NOTB, start, null,
count, null);
H5.H5Dwrite(dataset_id, HDF5Constants.H5T_C_S1,
HDF5Constants.H5S_ALL, HDF5Constants.H5S_ALL,
HDF5Constants.H5P_DEFAULT, valuesToWrite);
lineEntry.clear();
}
index++
}
Does this look correct? Also I don't need to extend the dataset if I can
allocate 25M entries, as long as I dont have to keep them all in memory at
the same time.
cheers, Håkon
On 22 March 2010 17:01, Peter Cao <[email protected]> wrote:
> Hi Håkon,
>
> A minor change to the writeUp(). For testing purpose, using String[] array
> instead of StringBuffer and
> direct pass the string array to H5Dwrite(). Currently, H5Dwrite() in
> hdf-java does not handle 2D array
> because of performance issue. For your case, you do not need to call
> H5DWrite() at writeUp() since
> you are going to write data part by part later.
>
> H5.H5Dextend() will be replaced with H5Dset_extent() in HDF5 1.8. The new
> HDF5 1.8 APIs are
> not supported in hdf-java. We are still working on this. For now, just use
> H5Dextend().
>
> When you create the dataset, you already have the space for 25M strings
> (i.e. new long[] { 25000000 }).
> Do you want to extend your data space more that? If not, you do not need to
> call H5Dextend(). Just
> select chunks you want to write.
>
>
> Thanks
> --pc
>
>
>
> Håkon Sagehaug wrote:
>
>> Hi
>>
>> So I tried, but not sure it worked. I cant figure out how to connect the
>> dataset to a file, so I can view it in hdf-viewer. Here is my method for
>> writing a an array of Strings to a dataset.
>>
>>
>> private static void writeUn() {
>> int file_id = -1;
>> int dcpl_id = -1;
>> int dataspace_id = -1;
>> int dataset_id = -1;
>> int memtype_id = -1;
>> int filetype_id = -1;
>> int plist = -1;
>>
>> long[] dims = { DIM_X };
>> long[] chunk_dims = { CHUNK_X };
>> long[] maxdims = { HDF5Constants.H5S_UNLIMITED };
>> byte[][] dset_data = new byte[DIM_X][SDIM];
>> StringBuffer[] str_data = new StringBuffer[DIM_X];
>>
>> // Initialize the dataset.
>> for (int indx = 0; indx < DIM_X; indx++)
>> str_data[indx] = new StringBuffer(String.valueOf("iteration "
>> + (indx + 1)));
>>
>> // Create a new file using default properties.
>> try {
>> file_id = H5.H5Fcreate(FILENAME_A, HDF5Constants.H5F_ACC_TRUNC,
>> HDF5Constants.H5P_DEFAULT, HDF5Constants.H5P_DEFAULT);
>> } catch (Exception e) {
>> e.printStackTrace();
>> }
>>
>> try {
>> filetype_id = H5.H5Tcopy(HDF5Constants.H5T_C_S1);
>> H5.H5Tset_size(filetype_id, SDIM);
>>
>> plist = H5.H5Pcreate(HDF5Constants.H5P_DATASET_CREATE);
>> H5.H5Pset_layout(plist, HDF5Constants.H5D_CHUNKED);
>> H5.H5Pset_chunk(plist, 1, new long[] { 1024 });
>>
>> H5.H5Pset_deflate(plist, 5);
>>
>> dataset_id = H5.H5Screate_simple(1, new long[] {
>> 25000000 }, new
>> long[] { HDF5Constants.H5S_UNLIMITED });
>> } catch (Exception e) {
>> e.printStackTrace();
>> }
>>
>> // Write the data to the dataset.
>> try {
>> for (int indx = 0; indx < DIM_X; indx++) {
>> for (int jndx = 0; jndx < SDIM; jndx++) {
>> if (jndx < str_data[indx].length())
>> dset_data[indx][jndx] = (byte) str_data[indx]
>> .charAt(jndx);
>> else
>> dset_data[indx][jndx] = 0;
>> }
>> }
>> if ((dataset_id >= 0) && (memtype_id >= 0))
>> H5.H5Dwrite(dataset_id, HDF5Constants.H5T_C_S1,
>> HDF5Constants.H5S_ALL, HDF5Constants.H5S_ALL,
>> HDF5Constants.H5P_DEFAULT, dset_data);
>> } catch (Exception e) {
>> e.printStackTrace();
>> }
>> }
>>
>> So my question, is this correct way of doing it and how do I connect the
>> dataset to the file. I guess it's time of creating the dataset.
>>
>> After this is the way forward, like this
>>
>> 1. H5.H5Dextend(dataset_id, extdims);
>> 2. dataspace_id = H5.H5Dget_space(dataset_id);
>> 3. H5.H5Sselect_all(dataspace_id);
>> // Subtract a hyperslab reflecting the original dimensions from
>> // the
>> // selection. The selection now contains only the newly extended
>> // portions of the dataset.
>> count[0] = dims[0];
>> count[1] = dims[1];
>> H5.H5Sselect_hyperslab(dataspace_id,
>> HDF5Constants.H5S_SELECT_NOTB, start, null,
>> count, null);
>>
>> // Write the data to the selected portion of the dataset.
>> if (dataset_id >= 0)
>> H5.H5Dwrite(dataset_id, HDF5Constants.H5T_NATIVE_INT,
>> HDF5Constants.H5S_ALL, dataspace_id,
>> HDF5Constants.H5P_DEFAULT, extend_dset_data);
>>
>> I also see that H5.H5Dextend is depricated, from version 1.8, is there
>> another method to user?
>>
>> cheers, Håkon
>> On 19 March 2010 16:05, Peter Cao <[email protected] <mailto:
>> [email protected]>> wrote:
>>
>> Hi Hakon,
>>
>> I assume you are using 1D array of strings. Here are some hints
>> for you:
>>
>> 1) You may just use string dataype. You can use variable length
>> string if your strings have different size,
>> or you can use fixed length string if your strings are about
>> the same length, e.g.
>> tid = H5.H5Tcopy(HDF5Constants.H5T_C_S1);
>> for fixed length of 128
>> H5.H5Tset_size(128);
>> for variable length
>> H5.H5Tset_size(tid, HDF5Constants.H5T_VARIABLE
>> 2) Set dataset creation property for chunking and compression
>> plist = H5.H5Pcreate(HDF5Constants.H5P_DATASET_CREATE);
>> H5.H5Pset_layout(plist, HDF5Constants.H5D_CHUNKED);
>> H5.H5Pset_chunk(plist, 1, new long[] {1024}); // set
>> the chunk size to be about 2MB for best performance
>> H5.H5Pset_deflate(plist, 5);
>>
>> 3) Set the dimension size, e.g.
>> sid = H5.H5Screate_simple(1, new long[]{25000000}, new
>> long[] {HDF5Constants.H5S_UNLIMITED});
>>
>>
>> Thanks
>> --pc
>>
>>
>> Håkon Sagehaug wrote:
>>
>> Hi Peter
>>
>> My problem is actually before I can create the dataset the
>> first time, I can't figure out the correct data type to use. I
>> guess I should use a byte type, since the strins are converted
>> to bytes
>>
>> Håkon
>>
>> On 19 March 2010 15:29, Peter Cao <[email protected]
>> <mailto:[email protected]> <mailto:[email protected]
>>
>> <mailto:[email protected]>>> wrote:
>>
>> Håkon,
>>
>> There was a typo in my previous email. You do NOT need to
>> read the
>> first chunk in order
>> to write the second chunk. You can just select whatever
>> chunks you
>> want to write.
>>
>> Sorry for the misleading.
>>
>> Thanks
>> --pc
>>
>>
>> Håkon Sagehaug wrote:
>>
>> Hi Peter
>>
>> I'm trying to do it with the read chunk by chunk, but
>> having
>> trouble creating the data set, in the example [1] it's done
>> like this
>>
>> H5.H5Dcreate(file_id, DATASETNAME,
>> HDF5Constants.H5T_STD_I32LE,
>> dataspace_id, dcpl_id);
>>
>> the type is for int, but I cant seem to find the
>> correct one
>> for string, in example[2] with string arrays t looks
>> like this,
>>
>> H5.H5Dcreate(file_id, DATASETNAME, filetype_id,
>> dataspace_id,
>> HDF5Constants.H5P_DEFAULT);
>>
>> If I create the dataset like this when I want to
>> dynamiccaly
>> add I can only get the first byte in each of the
>> string. Any
>> tips on what type I should use?
>>
>>
>> Håkon
>>
>> [1]
>>
>> http://www.hdfgroup.org/ftp/HDF5/examples/examples-by-api/java/examples/datasets/H5Ex_D_UnlimitedAdd.java
>>
>> [2]
>> http://www.hdfgroup.org/ftp/HDF5/examples/examples-by-api/java/examples/datatypes/H5Ex_T_String.java
>>
>>
>>
>>
>> -- Håkon Sagehaug, Scientific Programmer
>> Parallab, Uni BCCS/Uni Research
>> [email protected] <mailto:[email protected]>
>> <mailto:[email protected] <mailto:[email protected]>>
>> <mailto:[email protected]
>> <mailto:[email protected]> <mailto:[email protected]
>> <mailto:[email protected]>>>,
>>
>> phone +47 55584125
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> [email protected] <mailto:[email protected]>
>> <mailto:[email protected] <mailto:[email protected]>>
>>
>>
>>
>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>>
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> [email protected] <mailto:[email protected]>
>> <mailto:[email protected] <mailto:[email protected]>>
>>
>>
>>
>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>>
>>
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> [email protected] <mailto:[email protected]>
>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>>
>>
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> [email protected] <mailto:[email protected]>
>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>>
>>
>>
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> [email protected]
>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>>
>>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org