On Sunday, November 17, 2002, at 07:22 AM, Robert Ramey wrote:
It can cause troubles, since for my portable codes I use int64_t or int32_t to be portable. In order for the library to write numbers in binary consistently we should also serialize them as 64-bit ore 32-bit. How do you do that when the bit size can vary from platform to platform? Do you check at runtime what the number of bits is and dispatch to serialization for that number of bits?From: Matthias Troyer <[EMAIL PROTECTED]>Imagine I use a platform where long is 64-bit, write it to the archive and then read it again on a platform where long is 32-bit. This will cause major problems.Suppose you have a number on the first platform that exceeds 32 significant bits. What happens when the number is loaded onto the second platform. Are the high order bits truncated? How do you address this problem now? If none of your longs are larger than 32 significant bits then there is not problem. If some are, the 32 machine can't represent them. This can't cause any problems you don't have already.
No, it seems that in the binary file you just write out the sizes of the integers and just fail the loading if the bit numbers don't agree. Using the fixed-bit-size integers instead would allow your binary files to be much more portable.
As I mentioned in the introductory part of my post, text archives are much longer than binary ones and thus cause bandwidth problems for some applications. Note that the option to compress the archive after writing works a) only if you serialize into files (which is only one use) and b) does not address the bandwidth problem of first writing the large text files.It also prevents the use of archive format that rely on fixed bit sizes (such as XDR orI believe that you could just typedef the above on both platforms and use a text archive
any other platform independent binary format). My suggestion thus is to
change the types in these functions to int8_t, int16_t, int32_t, as was
already done for int64_t. That way portable implementations will be
possible.
and every thing would just fine. The text archive represents all numbers
as arbitrary length integers which would be converted correctly on
save as well as load.
Yes, but it does so by calling the virtual operator << for each element, which is very slow if you2.) The second problem is speed when serializing large containers of
basic data types, e.g. a vector<double> or ublas vectors and matrices.
In my applications these can easily by hundreds of megabyte in size. In
the current implementation, serializing a std::vector<double>(10000000)
requires ten million virtual function calls. In order to prevent this,
I propose to add extra virtual functions (like the operator<< above),
which serialize C-arrays of basic data tyes, i.e. functions like
Serialization version 6 which was submitted for review includes serialization of C-arrays. It is documented in the reference under the title "Serialization Implementations included in the Library" and a test case was added to test.cpp.
call it millions of times.
That will not work since overriding is a compile-time decision while I decide the archive format at runtime and thus need to have these optimized functions available as virtual functions.In conjunction with this, the serialization for std::vector and for ublas vectors, etc. has to be adapted to make use of these optimized serialization functions for basic data types.The library permits override of the included implementations. Of course, this has to be up to the person who finds the the included implementation inconvenient in some way as he is the only one who knows what he wants changed.
No, for the user who does not care about it nothing must be changed in his code at all!the serialization of very large numbers of small objects. The current
library shows a way to optimize this (in reference.html#large), but it
is rather cumbersome. As it is now, I have to reimplement the
serialization of std::vector<T>, or std::list<T>, etc., for all such
types T. In almost all of my codes I have a large number of small
objects of various types for which I know that I will never serialize a
pointer. I would thus propose the following:
i) add a traits class to specify whether ever a pointer to an object will be serialized or if it should be treated as a small object for which serialization should be optimizedii) specialize the serialization of the standard library containers for
these small objects, using the mechanism in the documentation.
I would be loath to implement this idea. Basically, instead of overloadingThat way I just need to specify a trait for my object and it will be serialized efficiently
the serializations that you want to speed up, you want to require
all of us to specify traites for every class we want to serialize.
Wwe can have a general template that defaults to the full and non-optimized serialization method for all classes for which we have not specialized it. That means no extra codes for the standard user, while the user who needs to optimize large collections of small objects would just provide a traits class, instead or reimplementing the serialization of all the standard containers for all his classes that need to be optimized. An example could be:
template <class T>
struct serialization_traits {
static const bool optimize_serialization=false;
};
Thus the traits class is written for all classes that do not need to be optimized. Only for the classes that I need to optimize I would need to just write:
template <> struct serialization_traits<MySmallClass> {
static const bool optimize_serialization=false;
}
and the operator << would dispatch based on the value of this trait, somehow like that:
template <class T, class A>
basic_oarchive& operator<<(basic_oarchive& a, const std::vector<T,A>& v)
{
return dispatch_serialization<serialization_traits::optimize_serialization>::se rialize(a,v);
}
It wouldThis would keep things as easy to use, no extra coding is required for those who do not care about the optimization, but life would be MUCH easier for other users. If you like, I can try to find time to implement this in the library. Also, since use just simple template specialization no modern compiler should have more problems than it already has with the library.
make things harder to use. Also, the current implementation - like much
boost code - stretches current compilers to the breaking point. Its
already much more complex to implement than I expected and
I already have much difficulty accomdating all the differences
in C++ implementations.
Could Java data structures be mapped to C++ then, to be able to read Java serialized files? But probably this is then not the scope of this library anyways but might be interesting as a later extension.Java has runtime reflexion which is used to obtain all the information required for serialization. Also, Java has a much more limited usage of pointers so certain problems we are dealing with don't come up. I don't believe that all the data structures can be unambiguously mapped to java.
Thank you. Note that none of the comments made so far have anyI would like to see a platform-independent binary archive format (e.g. using XDR), but am also willing to contribute that myself once the interface has been finalized.
impact on the interfaces defined by the base classes basic_[i|o]archive,
except that I prefer int16_t, int32_t, ... instead of short and long
So there is no reason you can't get started now. As you can see from the 3 derivations included in the package, making your own XDRarchive is a pretty simple proposition.
I'll do that once I get the library to compile, and will send it to you.
I like the documentation to be self-contained. A documentation page including a synposis of asic_[i|o]archive, and showing which functions to implement would make it easier then scanning through the header file, past all the pragmas, comments and other classes until one finds the class definition.As was already remarked by others, I would like to see documentation on
exactly which functions a new archive type has to implement.
Wouldn't be easier just to look at the basic_[i|o]archive code?
That makes sense.Perhaps we might want to break out text_archive an native binary archive into separate headers. That might make it more obvious that these derivations aren't really part of the library but rather more like popular examples.
I thank you for your effort in replying in such a detailed manner to my comments and want to quickly summarize the open issues:
i) as you already use int64_t for 64-bit integers why not also use int32_t, int16_t, etc? That would make it more consistent and MUCH easier to implement portable binary formats!
ii) support for optimization of serialization by a traits class would be extremely important and helpful, without incurring any extra coding efforts for standard users!
iii) additional virtual functions to serialize large arrays of data (e.g. dense vectors and matrices) instead of calling operator << for each of the (possible millions of) elements is still needed for optimization and to make use of corresponding functions for some binary serialization formats (e.g. in XDR or for PVM)
I would volunteer to implement ii) and iii) into the library if you agree and do not want to do it yourself.
With best regards,
Matthias
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost