Hi Peter,
Thanks for the reply, sorry if I'm asking alot of questions, but I
still can't figure out how to connect the write of a dataset to a file.
We want to give the hdf file to a another program that does some
analysis for us. I guess my psudo code looks something like this.
Assume that the file has 25M lines
List lineEntry
int datsetSize = 25000000
int index = 0;
//Number of strings to write in one chunk
int maxRecords = 1500000;
for(String line in file F) {
lineEntry.add(line)
if(index % maxRecords == 0){
String [] valuesToWrite = lineEntry.toArray();
//Find out where in the dataset to start the writing from,
using hyperslab
int dataspace_id = H5.H5Dget_space(dataset_id);
H5.H5Sselect_all(dataspace_id);
//2'end iteration the start index for writing is 1500000*2
//So the writing will start from 3 M -> 4.5M
H5
.H5Sselect_hyperslab(dataspace_id,
HDF5Constants.H5S_SELECT_NOTB, start,
null,
count, null);
H5.H5Dwrite(dataset_id, HDF5Constants.H5T_C_S1,
HDF5Constants.H5S_ALL, HDF5Constants.H5S_ALL,
HDF5Constants.H5P_DEFAULT, valuesToWrite);
lineEntry.clear();
}
index++
}
Does this look correct? Also I don't need to extend the dataset if I
can allocate 25M entries, as long as I dont have to keep them all in
memory at the same time.
cheers, Håkon
On 22 March 2010 17:01, Peter Cao <[email protected]
<mailto:[email protected]>> wrote:
Hi Håkon,
A minor change to the writeUp(). For testing purpose, using
String[] array instead of StringBuffer and
direct pass the string array to H5Dwrite(). Currently, H5Dwrite()
in hdf-java does not handle 2D array
because of performance issue. For your case, you do not need to
call H5DWrite() at writeUp() since
you are going to write data part by part later.
H5.H5Dextend() will be replaced with H5Dset_extent() in HDF5 1.8.
The new HDF5 1.8 APIs are
not supported in hdf-java. We are still working on this. For now,
just use H5Dextend().
When you create the dataset, you already have the space for 25M
strings (i.e. new long[] { 25000000 }).
Do you want to extend your data space more that? If not, you do
not need to call H5Dextend(). Just
select chunks you want to write.
Thanks
--pc
Håkon Sagehaug wrote:
Hi
So I tried, but not sure it worked. I cant figure out how to
connect the dataset to a file, so I can view it in hdf-viewer.
Here is my method for writing a an array of Strings to a dataset.
private static void writeUn() {
int file_id = -1;
int dcpl_id = -1;
int dataspace_id = -1;
int dataset_id = -1;
int memtype_id = -1;
int filetype_id = -1;
int plist = -1;
long[] dims = { DIM_X };
long[] chunk_dims = { CHUNK_X };
long[] maxdims = { HDF5Constants.H5S_UNLIMITED };
byte[][] dset_data = new byte[DIM_X][SDIM];
StringBuffer[] str_data = new StringBuffer[DIM_X];
// Initialize the dataset.
for (int indx = 0; indx < DIM_X; indx++)
str_data[indx] = new
StringBuffer(String.valueOf("iteration "
+ (indx + 1)));
// Create a new file using default properties.
try {
file_id = H5.H5Fcreate(FILENAME_A,
HDF5Constants.H5F_ACC_TRUNC,
HDF5Constants.H5P_DEFAULT,
HDF5Constants.H5P_DEFAULT);
} catch (Exception e) {
e.printStackTrace();
}
try {
filetype_id = H5.H5Tcopy(HDF5Constants.H5T_C_S1);
H5.H5Tset_size(filetype_id, SDIM);
plist = H5.H5Pcreate(HDF5Constants.H5P_DATASET_CREATE);
H5.H5Pset_layout(plist, HDF5Constants.H5D_CHUNKED);
H5.H5Pset_chunk(plist, 1, new long[] { 1024 });
H5.H5Pset_deflate(plist, 5);
dataset_id = H5.H5Screate_simple(1, new
long[] { 25000000 }, new
long[] { HDF5Constants.H5S_UNLIMITED });
} catch (Exception e) {
e.printStackTrace();
}
// Write the data to the dataset.
try {
for (int indx = 0; indx < DIM_X; indx++) {
for (int jndx = 0; jndx < SDIM; jndx++) {
if (jndx < str_data[indx].length())
dset_data[indx][jndx] = (byte)
str_data[indx]
.charAt(jndx);
else
dset_data[indx][jndx] = 0;
}
}
if ((dataset_id >= 0) && (memtype_id >= 0))
H5.H5Dwrite(dataset_id, HDF5Constants.H5T_C_S1,
HDF5Constants.H5S_ALL,
HDF5Constants.H5S_ALL,
HDF5Constants.H5P_DEFAULT, dset_data);
} catch (Exception e) {
e.printStackTrace();
}
}
So my question, is this correct way of doing it and how do I
connect the dataset to the file. I guess it's time of creating
the dataset.
After this is the way forward, like this
1. H5.H5Dextend(dataset_id, extdims);
2. dataspace_id = H5.H5Dget_space(dataset_id);
3. H5.H5Sselect_all(dataspace_id);
// Subtract a hyperslab reflecting the original dimensions from
// the
// selection. The selection now contains only the newly extended
// portions of the dataset.
count[0] = dims[0];
count[1] = dims[1];
H5.H5Sselect_hyperslab(dataspace_id,
HDF5Constants.H5S_SELECT_NOTB,
start, null,
count, null);
// Write the data to the selected portion of the dataset.
if (dataset_id >= 0)
H5.H5Dwrite(dataset_id, HDF5Constants.H5T_NATIVE_INT,
HDF5Constants.H5S_ALL, dataspace_id,
HDF5Constants.H5P_DEFAULT, extend_dset_data);
I also see that H5.H5Dextend is depricated, from version 1.8,
is there another method to user?
cheers, Håkon
On 19 March 2010 16:05, Peter Cao <[email protected]
<mailto:[email protected]> <mailto:[email protected]
<mailto:[email protected]>>> wrote:
Hi Hakon,
I assume you are using 1D array of strings. Here are some hints
for you:
1) You may just use string dataype. You can use variable length
string if your strings have different size,
or you can use fixed length string if your strings are about
the same length, e.g.
tid = H5.H5Tcopy(HDF5Constants.H5T_C_S1);
for fixed length of 128
H5.H5Tset_size(128);
for variable length
H5.H5Tset_size(tid, HDF5Constants.H5T_VARIABLE
2) Set dataset creation property for chunking and compression
plist =
H5.H5Pcreate(HDF5Constants.H5P_DATASET_CREATE);
H5.H5Pset_layout(plist,
HDF5Constants.H5D_CHUNKED);
H5.H5Pset_chunk(plist, 1, new long[] {1024});
// set
the chunk size to be about 2MB for best performance
H5.H5Pset_deflate(plist, 5);
3) Set the dimension size, e.g.
sid = H5.H5Screate_simple(1, new
long[]{25000000}, new
long[] {HDF5Constants.H5S_UNLIMITED});
Thanks
--pc
Håkon Sagehaug wrote:
Hi Peter
My problem is actually before I can create the dataset the
first time, I can't figure out the correct data type to
use. I
guess I should use a byte type, since the strins are
converted
to bytes
Håkon
On 19 March 2010 15:29, Peter Cao <[email protected]
<mailto:[email protected]>
<mailto:[email protected] <mailto:[email protected]>>
<mailto:[email protected] <mailto:[email protected]>
<mailto:[email protected] <mailto:[email protected]>>>>
wrote:
Håkon,
There was a typo in my previous email. You do NOT
need to
read the
first chunk in order
to write the second chunk. You can just select whatever
chunks you
want to write.
Sorry for the misleading.
Thanks
--pc
Håkon Sagehaug wrote:
Hi Peter
I'm trying to do it with the read chunk by
chunk, but
having
trouble creating the data set, in the example
[1] it's done
like this
H5.H5Dcreate(file_id, DATASETNAME,
HDF5Constants.H5T_STD_I32LE,
dataspace_id, dcpl_id);
the type is for int, but I cant seem to find the
correct one
for string, in example[2] with string arrays t looks
like this,
H5.H5Dcreate(file_id, DATASETNAME, filetype_id,
dataspace_id,
HDF5Constants.H5P_DEFAULT);
If I create the dataset like this when I want to
dynamiccaly
add I can only get the first byte in each of the
string. Any
tips on what type I should use?
Håkon
[1]
http://www.hdfgroup.org/ftp/HDF5/examples/examples-by-api/java/examples/datasets/H5Ex_D_UnlimitedAdd.java
[2]http://www.hdfgroup.org/ftp/HDF5/examples/examples-by-api/java/examples/datatypes/H5Ex_T_String.java
-- Håkon Sagehaug, Scientific Programmer
Parallab, Uni BCCS/Uni Research
[email protected]
<mailto:[email protected]> <mailto:[email protected]
<mailto:[email protected]>>
<mailto:[email protected]
<mailto:[email protected]> <mailto:[email protected]
<mailto:[email protected]>>>
<mailto:[email protected]
<mailto:[email protected]>
<mailto:[email protected]
<mailto:[email protected]>> <mailto:[email protected]
<mailto:[email protected]>
<mailto:[email protected]
<mailto:[email protected]>>>>,
phone +47 55584125
------------------------------------------------------------------------
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
<mailto:[email protected]> <mailto:[email protected]
<mailto:[email protected]>>
<mailto:[email protected]
<mailto:[email protected]> <mailto:[email protected]
<mailto:[email protected]>>>
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
<mailto:[email protected]> <mailto:[email protected]
<mailto:[email protected]>>
<mailto:[email protected]
<mailto:[email protected]> <mailto:[email protected]
<mailto:[email protected]>>>
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
------------------------------------------------------------------------
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected] <mailto:[email protected]>
<mailto:[email protected] <mailto:[email protected]>>
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected] <mailto:[email protected]>
<mailto:[email protected] <mailto:[email protected]>>
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
------------------------------------------------------------------------
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected] <mailto:[email protected]>
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected] <mailto:[email protected]>
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
------------------------------------------------------------------------
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org