Hi again,

More on this ...

I have just compiled a simple program (attached) that repeatedly opens, appends, then closes a bunch of files, but fails to run past a single iteration. This suggests to me we're either calling something wrong or HDF5 is internally not correctly closing things down after the first iteration. The fact that things are not being closed properly could explain why we're seeing problems in production with semi-corupt files?....

To compile (on Ubuntu 12.04 64-bit) assuming HDF5 1.8.10 is installed to /usr/local:

$ g++ -std=c++0x -I /usr/local/hdf5-1.8.10-linux-x86_64-static/include simple.cpp /usr/local/hdf5-1.8.10-linux-x86_64-static/lib/libhdf5_hl_cpp.a /usr/local/hdf5-1.8.10-linux-x86_64-static/lib/libhdf5_hl.a /usr/local/hdf5-1.8.10-linux-x86_64-static/lib/libhdf5_cpp.a /usr/local/hdf5-1.8.10-linux-x86_64-static/lib/libhdf5.a /usr/local/hdf5-1.8.10-linux-x86_64-static/lib/libsz.a /usr/local/hdf5-1.8.10-linux-x86_64-static/lib/libz.a

Output:

$ ./a.out
HDF5-DIAG: Error detected in HDF5 (1.8.10) thread 0:
#000: /home/hdftest/snapshots-bin-hdf5_1_8_10/current/src/H5D.c line 170 in H5Dcreate2(): unable to create dataset
    major: Dataset
    minor: Unable to initialize object
#001: /home/hdftest/snapshots-bin-hdf5_1_8_10/current/src/H5Dint.c line 439 in H5D__create_named(): unable to create and link to dataset
    major: Dataset
    minor: Unable to initialize object
#002: /home/hdftest/snapshots-bin-hdf5_1_8_10/current/src/H5L.c line 1638 in H5L_link_object(): unable to create new link to object
    major: Links
    minor: Unable to initialize object
#003: /home/hdftest/snapshots-bin-hdf5_1_8_10/current/src/H5L.c line 1882 in H5L_create_real(): can't insert link
    major: Symbol table
    minor: Unable to insert object
#004: /home/hdftest/snapshots-bin-hdf5_1_8_10/current/src/H5Gtraverse.c line 861 in H5G_traverse(): internal path traversal failed
    major: Symbol table
    minor: Object not found
#005: /home/hdftest/snapshots-bin-hdf5_1_8_10/current/src/H5Gtraverse.c line 641 in H5G_traverse_real(): traversal operator failed
    major: Symbol table
    minor: Callback failed
#006: /home/hdftest/snapshots-bin-hdf5_1_8_10/current/src/H5L.c line 1674 in H5L_link_cb(): name already exists
    major: Symbol table
    minor: Object already exists
AppendPackets failed on iteration 1

Can anyone provide some insight into why this might be failing?

Many thanks
Jess

On 12/12/12 08:50, Jess Morecroft wrote:
Hi,

We've been occasionally seeing HDF5 read failures in our production environment (using HDF5 1.8.4, C++ packet table API) so are attempting to upgrade to 1.8.10 in the hope that it might fix things. Unfortunately the problem appears to now be worse ...

To give you an example of the kind of weirdness we're seeing, we have a particular file with the following header (as per h5dump):

HDF5 "HotSpot_FX_filtered_NZDUSD-TheoreticalQuote.h5" {
GROUP "/" {
   DATASET "TheoreticalQuote" {
      DATATYPE H5T_COMPOUND {
H5T_STD_I64LE "TimeStamp";
H5T_IEEE_F64LE "BidPrice";
H5T_IEEE_F64LE "AskPrice";
H5T_IEEE_F64LE "Volume";
H5T_IEEE_F64LE "LastInputBidPrice";
H5T_IEEE_F64LE "LastInputAskPrice";
      }
DATASPACE  SIMPLE { ( 28851988 ) / ( H5S_UNLIMITED ) }
   }
}
}

As you can see this file (150MB in size, compressed) has ~28M records. If we try to read a few records at the end, we succeed:

$ h5dump --dataset TheoreticalQuote -s 28851970 -c 5 HotSpot_FX_filtered_NZDUSD-TheoreticalQuote.h5 | tail -15
0.83743,
0.83745
         },
(28851974): {
3564222274822547,
0.83743,
0.83745,
nan,
0.83743,
0.83745
         }
      }
   }
}
}

If we try to read a large set of records (300K) in the middle, we also succeed, but only sometimes!:

$ h5dump --dataset TheoreticalQuote -s 15000000 -c 300000 HotSpot_FX_filtered_NZDUSD-TheoreticalQuote.h5 | tail -15
0.82127,
0.82144
         },
(15299999): {
3558294916506950,
0.82127,
0.82144,
nan,
0.82127,
0.82144
         }
      }
   }
}
}

Trying a different starting point, we don't get an error per se, but where are the results?

$ h5dump --dataset TheoreticalQuote -s 14700000 -c 300000 HotSpot_FX_filtered_NZDUSD-TheoreticalQuote.h5 | tail -15
H5T_IEEE_F64LE "Volume";
H5T_IEEE_F64LE "LastInputBidPrice";
H5T_IEEE_F64LE "LastInputAskPrice";
   }
   DATASPACE SIMPLE { ( 28851988 ) / ( H5S_UNLIMITED ) }
   SUBSET {
      START ( 14700000 );
      STRIDE ( 1 );
      COUNT ( 300000 );
      BLOCK ( 1 );
      DATA {
      }
   }
}
}

Finally, these peculiarities probably suggest a subtly corrupt file and explain why our application using the packet table API fails to read this particular file at this offset, as per our log:

2012-Dec-12 08:18:58.656324[0x00007faae7fff700]: DEBUG: dataStoreLib.BufferedFile(NZDUSD): reading from file /home/ligerdemo/data/HotSpot/FX/filtered/NZDUSD/HotSpot_FX_filtered_NZDUSD-TheoreticalQuote.h5, earliest first = true, page *start index = 14700000, page end index = 15000000*, start index = 14700000, end index = 28920777 2012-Dec-12 08:18:58.662190[0x00007faae7fff700]: ERROR: HDF5: seq: 0 file: /home/hdftest/snapshots-bin-hdf5_1_8_10/current/src/H5Zdeflate.c function: H5Z_filter_deflate line: 125 desc: inflate() failed 2012-Dec-12 08:18:58.662214[0x00007faae7fff700]: ERROR: HDF5: seq: 1 file: /home/hdftest/snapshots-bin-hdf5_1_8_10/current/src/H5Z.c function: H5Z_pipeline line: 1120 desc: filter returned failure during read 2012-Dec-12 08:18:58.662220[0x00007faae7fff700]: ERROR: HDF5: seq: 2 file: /home/hdftest/snapshots-bin-hdf5_1_8_10/current/src/H5Dchunk.c function: H5D__chunk_lock line: 2766 desc: data pipeline read failed 2012-Dec-12 08:18:58.662225[0x00007faae7fff700]: ERROR: HDF5: seq: 3 file: /home/hdftest/snapshots-bin-hdf5_1_8_10/current/src/H5Dchunk.c function: H5D__chunk_read line: 1735 desc: unable to read raw data chunk 2012-Dec-12 08:18:58.662229[0x00007faae7fff700]: ERROR: HDF5: seq: 4 file: /home/hdftest/snapshots-bin-hdf5_1_8_10/current/src/H5Dio.c function: H5D__read line: 449 desc: can't read data 2012-Dec-12 08:18:58.662242[0x00007faae7fff700]: ERROR: HDF5: seq: 5 file: /home/hdftest/snapshots-bin-hdf5_1_8_10/current/src/H5Dio.c function: H5Dread line: 174 desc: can't read data 2012-Dec-12 08:18:58.662257[0x00007faae7fff700]: CRITICAL: File::File: Failed to get records between indexes *14700000, 14999999* from file /home/ligerdemo/data/HotSpot/FX/filtered/NZDUSD/HotSpot_FX_filtered_NZDUSD-TheoreticalQuote.h5

Things to note:

 1. The "corrupt" file in question was originally created using the
    HDF5 1.8.4 API, and is now being read/ appended using HDF5 1.8.10.
 2. Our application tries to read this file using the 1.8.10 API
 3. The h5dump utility used above is an old version - 1.8.4 - though I
    do not think this is relevant due to the application failing to
    read also.

My basic question is - has anyone seen this kind of invisible file corruption before, and if so do you know what might cause this? Also, I'm wondering if perhaps we're not shutting down / closing files correctly, which is causing these corruption problems .... right now our code constructs a H5::CompType object, a H5::H5File object, and a FL_PacketTable object in that order per file, then destructs in the reverse order .... is that sufficient or should we be calling a global shutdown routine as well?

Any help on this would be very, very appreciated.

Thanks

#include "H5Cpp.h"
#include "H5PacketTable.h"

#include <iostream>
#include <sstream>
#include <memory>
#include <vector>

struct MyType
{
    static std::unique_ptr<H5::CompType> GetH5Type()
    {
        std::unique_ptr<H5::CompType> compType(new H5::CompType(sizeof(MyType)));
        compType->insertMember("A", HOFFSET(MyType, a), H5::PredType::NATIVE_INT64);
        compType->insertMember("B", HOFFSET(MyType, b), H5::PredType::NATIVE_DOUBLE);
        compType->insertMember("C", HOFFSET(MyType, c), H5::PredType::NATIVE_DOUBLE);
        compType->insertMember("D", HOFFSET(MyType, d), H5::PredType::NATIVE_DOUBLE);
        compType->insertMember("E", HOFFSET(MyType, e), H5::PredType::NATIVE_DOUBLE);
        compType->insertMember("F", HOFFSET(MyType, f), H5::PredType::NATIVE_DOUBLE);
        return compType;
    }

    int64_t a;
    double b;
    double c;
    double d;
    double e;
    double f;

    bool operator==(const MyType& rhs) const
    {
        return a == rhs.a &&
            b == rhs.b &&
            c == rhs.c &&
            d == rhs.d &&
            e == rhs.e &&
            f == rhs.f;
    }
    
    bool operator!=(const MyType& rhs) const
    {
        return !operator==(rhs);
    }
};

struct File
{
    File(size_t id, bool truncate) 
    :
        type(MyType::GetH5Type())
    {
        std::stringstream ss;
        ss << "/tmp/" << id << ".h5";
        file.reset(new H5::H5File(ss.str(), truncate?H5F_ACC_TRUNC:H5F_ACC_RDWR));
        table.reset(new FL_PacketTable(file->getId(), (char*)"MyType", type->getId(), 2048, 9));
    }

    std::unique_ptr<H5::CompType> type;
    std::unique_ptr<H5::H5File> file;
    std::unique_ptr<FL_PacketTable> table;
};

typedef std::shared_ptr<File> FilePtr;

int main()
{
    //size_t iterations(1); // this DOES work
    size_t iterations(2); // this DOES NOT work ... something not getting closed on first iteration?

    for (size_t x(0); x < iterations; ++x)
    { 
        std::vector<FilePtr> files;
        files.resize(1000);

        size_t count(0);
        for (auto& file : files)
        {
            file.reset(new File(++count, x==0));

            std::vector<MyType> records;
            records.resize(10);
            for (size_t i(0); i < records.size(); ++i)
            {
                records[i].a = i;
                records[i].b = 0.1;
                records[i].c = 0.1;
                records[i].d = 0.1;
                records[i].e = 0.1;
                records[i].f = 0.2;
            }

            if (file->table->AppendPackets(records.size(), (void*)&records[0]) < 0)
            {
                std::cerr << "AppendPackets failed on iteration " << x << std::endl;
                return 1; 
            }
        }
        
        files.clear();
    }

    H5::H5Library::close();

    return 0;
}

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to