Hi again,
More on this ...
I have just compiled a simple program (attached) that repeatedly opens,
appends, then closes a bunch of files, but fails to run past a single
iteration. This suggests to me we're either calling something wrong or
HDF5 is internally not correctly closing things down after the first
iteration. The fact that things are not being closed properly could
explain why we're seeing problems in production with semi-corupt files?....
To compile (on Ubuntu 12.04 64-bit) assuming HDF5 1.8.10 is installed to
/usr/local:
$ g++ -std=c++0x -I /usr/local/hdf5-1.8.10-linux-x86_64-static/include
simple.cpp
/usr/local/hdf5-1.8.10-linux-x86_64-static/lib/libhdf5_hl_cpp.a
/usr/local/hdf5-1.8.10-linux-x86_64-static/lib/libhdf5_hl.a
/usr/local/hdf5-1.8.10-linux-x86_64-static/lib/libhdf5_cpp.a
/usr/local/hdf5-1.8.10-linux-x86_64-static/lib/libhdf5.a
/usr/local/hdf5-1.8.10-linux-x86_64-static/lib/libsz.a
/usr/local/hdf5-1.8.10-linux-x86_64-static/lib/libz.a
Output:
$ ./a.out
HDF5-DIAG: Error detected in HDF5 (1.8.10) thread 0:
#000: /home/hdftest/snapshots-bin-hdf5_1_8_10/current/src/H5D.c line
170 in H5Dcreate2(): unable to create dataset
major: Dataset
minor: Unable to initialize object
#001: /home/hdftest/snapshots-bin-hdf5_1_8_10/current/src/H5Dint.c
line 439 in H5D__create_named(): unable to create and link to dataset
major: Dataset
minor: Unable to initialize object
#002: /home/hdftest/snapshots-bin-hdf5_1_8_10/current/src/H5L.c line
1638 in H5L_link_object(): unable to create new link to object
major: Links
minor: Unable to initialize object
#003: /home/hdftest/snapshots-bin-hdf5_1_8_10/current/src/H5L.c line
1882 in H5L_create_real(): can't insert link
major: Symbol table
minor: Unable to insert object
#004:
/home/hdftest/snapshots-bin-hdf5_1_8_10/current/src/H5Gtraverse.c line
861 in H5G_traverse(): internal path traversal failed
major: Symbol table
minor: Object not found
#005:
/home/hdftest/snapshots-bin-hdf5_1_8_10/current/src/H5Gtraverse.c line
641 in H5G_traverse_real(): traversal operator failed
major: Symbol table
minor: Callback failed
#006: /home/hdftest/snapshots-bin-hdf5_1_8_10/current/src/H5L.c line
1674 in H5L_link_cb(): name already exists
major: Symbol table
minor: Object already exists
AppendPackets failed on iteration 1
Can anyone provide some insight into why this might be failing?
Many thanks
Jess
On 12/12/12 08:50, Jess Morecroft wrote:
Hi,
We've been occasionally seeing HDF5 read failures in our production
environment (using HDF5 1.8.4, C++ packet table API) so are attempting
to upgrade to 1.8.10 in the hope that it might fix things.
Unfortunately the problem appears to now be worse ...
To give you an example of the kind of weirdness we're seeing, we have
a particular file with the following header (as per h5dump):
HDF5 "HotSpot_FX_filtered_NZDUSD-TheoreticalQuote.h5" {
GROUP "/" {
DATASET "TheoreticalQuote" {
DATATYPE H5T_COMPOUND {
H5T_STD_I64LE "TimeStamp";
H5T_IEEE_F64LE "BidPrice";
H5T_IEEE_F64LE "AskPrice";
H5T_IEEE_F64LE "Volume";
H5T_IEEE_F64LE "LastInputBidPrice";
H5T_IEEE_F64LE "LastInputAskPrice";
}
DATASPACE SIMPLE { ( 28851988 ) / ( H5S_UNLIMITED ) }
}
}
}
As you can see this file (150MB in size, compressed) has ~28M records.
If we try to read a few records at the end, we succeed:
$ h5dump --dataset TheoreticalQuote -s 28851970 -c 5
HotSpot_FX_filtered_NZDUSD-TheoreticalQuote.h5 | tail -15
0.83743,
0.83745
},
(28851974): {
3564222274822547,
0.83743,
0.83745,
nan,
0.83743,
0.83745
}
}
}
}
}
If we try to read a large set of records (300K) in the middle, we also
succeed, but only sometimes!:
$ h5dump --dataset TheoreticalQuote -s 15000000 -c 300000
HotSpot_FX_filtered_NZDUSD-TheoreticalQuote.h5 | tail -15
0.82127,
0.82144
},
(15299999): {
3558294916506950,
0.82127,
0.82144,
nan,
0.82127,
0.82144
}
}
}
}
}
Trying a different starting point, we don't get an error per se, but
where are the results?
$ h5dump --dataset TheoreticalQuote -s 14700000 -c 300000
HotSpot_FX_filtered_NZDUSD-TheoreticalQuote.h5 | tail -15
H5T_IEEE_F64LE "Volume";
H5T_IEEE_F64LE "LastInputBidPrice";
H5T_IEEE_F64LE "LastInputAskPrice";
}
DATASPACE SIMPLE { ( 28851988 ) / ( H5S_UNLIMITED ) }
SUBSET {
START ( 14700000 );
STRIDE ( 1 );
COUNT ( 300000 );
BLOCK ( 1 );
DATA {
}
}
}
}
Finally, these peculiarities probably suggest a subtly corrupt file
and explain why our application using the packet table API fails to
read this particular file at this offset, as per our log:
2012-Dec-12 08:18:58.656324[0x00007faae7fff700]: DEBUG:
dataStoreLib.BufferedFile(NZDUSD): reading from file
/home/ligerdemo/data/HotSpot/FX/filtered/NZDUSD/HotSpot_FX_filtered_NZDUSD-TheoreticalQuote.h5,
earliest first = true, page *start index = 14700000, page end index =
15000000*, start index = 14700000, end index = 28920777
2012-Dec-12 08:18:58.662190[0x00007faae7fff700]: ERROR: HDF5: seq: 0
file: /home/hdftest/snapshots-bin-hdf5_1_8_10/current/src/H5Zdeflate.c
function: H5Z_filter_deflate line: 125 desc: inflate() failed
2012-Dec-12 08:18:58.662214[0x00007faae7fff700]: ERROR: HDF5: seq: 1
file: /home/hdftest/snapshots-bin-hdf5_1_8_10/current/src/H5Z.c
function: H5Z_pipeline line: 1120 desc: filter returned failure during
read
2012-Dec-12 08:18:58.662220[0x00007faae7fff700]: ERROR: HDF5: seq: 2
file: /home/hdftest/snapshots-bin-hdf5_1_8_10/current/src/H5Dchunk.c
function: H5D__chunk_lock line: 2766 desc: data pipeline read failed
2012-Dec-12 08:18:58.662225[0x00007faae7fff700]: ERROR: HDF5: seq: 3
file: /home/hdftest/snapshots-bin-hdf5_1_8_10/current/src/H5Dchunk.c
function: H5D__chunk_read line: 1735 desc: unable to read raw data chunk
2012-Dec-12 08:18:58.662229[0x00007faae7fff700]: ERROR: HDF5: seq: 4
file: /home/hdftest/snapshots-bin-hdf5_1_8_10/current/src/H5Dio.c
function: H5D__read line: 449 desc: can't read data
2012-Dec-12 08:18:58.662242[0x00007faae7fff700]: ERROR: HDF5: seq: 5
file: /home/hdftest/snapshots-bin-hdf5_1_8_10/current/src/H5Dio.c
function: H5Dread line: 174 desc: can't read data
2012-Dec-12 08:18:58.662257[0x00007faae7fff700]: CRITICAL: File::File:
Failed to get records between indexes *14700000, 14999999* from file
/home/ligerdemo/data/HotSpot/FX/filtered/NZDUSD/HotSpot_FX_filtered_NZDUSD-TheoreticalQuote.h5
Things to note:
1. The "corrupt" file in question was originally created using the
HDF5 1.8.4 API, and is now being read/ appended using HDF5 1.8.10.
2. Our application tries to read this file using the 1.8.10 API
3. The h5dump utility used above is an old version - 1.8.4 - though I
do not think this is relevant due to the application failing to
read also.
My basic question is - has anyone seen this kind of invisible file
corruption before, and if so do you know what might cause this? Also,
I'm wondering if perhaps we're not shutting down / closing files
correctly, which is causing these corruption problems .... right now
our code constructs a H5::CompType object, a H5::H5File object, and a
FL_PacketTable object in that order per file, then destructs in the
reverse order .... is that sufficient or should we be calling a global
shutdown routine as well?
Any help on this would be very, very appreciated.
Thanks
#include "H5Cpp.h"
#include "H5PacketTable.h"
#include <iostream>
#include <sstream>
#include <memory>
#include <vector>
struct MyType
{
static std::unique_ptr<H5::CompType> GetH5Type()
{
std::unique_ptr<H5::CompType> compType(new H5::CompType(sizeof(MyType)));
compType->insertMember("A", HOFFSET(MyType, a), H5::PredType::NATIVE_INT64);
compType->insertMember("B", HOFFSET(MyType, b), H5::PredType::NATIVE_DOUBLE);
compType->insertMember("C", HOFFSET(MyType, c), H5::PredType::NATIVE_DOUBLE);
compType->insertMember("D", HOFFSET(MyType, d), H5::PredType::NATIVE_DOUBLE);
compType->insertMember("E", HOFFSET(MyType, e), H5::PredType::NATIVE_DOUBLE);
compType->insertMember("F", HOFFSET(MyType, f), H5::PredType::NATIVE_DOUBLE);
return compType;
}
int64_t a;
double b;
double c;
double d;
double e;
double f;
bool operator==(const MyType& rhs) const
{
return a == rhs.a &&
b == rhs.b &&
c == rhs.c &&
d == rhs.d &&
e == rhs.e &&
f == rhs.f;
}
bool operator!=(const MyType& rhs) const
{
return !operator==(rhs);
}
};
struct File
{
File(size_t id, bool truncate)
:
type(MyType::GetH5Type())
{
std::stringstream ss;
ss << "/tmp/" << id << ".h5";
file.reset(new H5::H5File(ss.str(), truncate?H5F_ACC_TRUNC:H5F_ACC_RDWR));
table.reset(new FL_PacketTable(file->getId(), (char*)"MyType", type->getId(), 2048, 9));
}
std::unique_ptr<H5::CompType> type;
std::unique_ptr<H5::H5File> file;
std::unique_ptr<FL_PacketTable> table;
};
typedef std::shared_ptr<File> FilePtr;
int main()
{
//size_t iterations(1); // this DOES work
size_t iterations(2); // this DOES NOT work ... something not getting closed on first iteration?
for (size_t x(0); x < iterations; ++x)
{
std::vector<FilePtr> files;
files.resize(1000);
size_t count(0);
for (auto& file : files)
{
file.reset(new File(++count, x==0));
std::vector<MyType> records;
records.resize(10);
for (size_t i(0); i < records.size(); ++i)
{
records[i].a = i;
records[i].b = 0.1;
records[i].c = 0.1;
records[i].d = 0.1;
records[i].e = 0.1;
records[i].f = 0.2;
}
if (file->table->AppendPackets(records.size(), (void*)&records[0]) < 0)
{
std::cerr << "AppendPackets failed on iteration " << x << std::endl;
return 1;
}
}
files.clear();
}
H5::H5Library::close();
return 0;
}
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org