On Sat, 2011-07-02 at 21:58 +0200, [email protected] wrote: > Hi Jon, > > no, reading .o5m.gz (resp. .o5c.gz) is not supported at present. > You usually don't do zlib compression with .o5m files, users will rather use > lzop if processing speed is important, or 7zip if a minimal file size is > required. > > However, the .o5m file format is usually chosen because of its speed, and > this advantage would get lost if you compressed the data. Therefore you may > expect input files to be uncompressed (an uncompressed .o5m file has nearly > the same size as a conventional .osm.bz2 file).
Do you have any speed benchmarks you can add to the o5m wiki page? Perhaps the timing from an "osm2pgsql -O null <file>" could be added to the existing table listing the file sizes? It would be nice if users had the option to read compressed o5m files directly as they do with .osm files. I can see that the benefit is smaller when you are using the o5m format. No one would be forced to use it. > > My main concern would be whether the changes introduce > > new external dependencies. > > Don't worry, there should be no additional dependencies. > > > Can we see the code? Maybe attach it to a ticket in trac. > > I just uploaded the necessary files to > http://m.m.i24.cc/o5m_osm2pgsql_20110702.zip > > parse-o5m.c (new) > parse-o5m.h (new) > osm2pgsql.c (a few minor changes) > Makefile.am (added parse-o5m.c and parse-o5m.h) > > Please consider the source as experimental. The import of a small test region > (100x100 km) worked fine, but there sill might be some bugs... I had a quick look through the code but I have not run it at all. My comments are related to the file format and the parsing code: - Add "o5m" to the --input-reader help text - It would be helpful to indicate the type (o5c vs o5h) in the header instead of just relying on the file name otherwise this won't work with stdin. - Validating the "o5m2" header would be useful to prevent non-o5m files being processed by mistake. - The format appears endian-specific. If you choose not to make it endian-neutral it would be good for the endianness to be recorded in the header and checked to prevent mistakes if files are moved between systems. - The best practice is to use enumerated types instead of the hard coded hex numbers for the protocol fields. Ideally all the protocol definitions should be in a header file. - The best practice for macros is to wrap them in a "do {...} while(0)". This avoids problems with trailing ;'s and nested if/else's. An example using your PERR macro would be: #define PERR(f) do { \ fprintf(stderr,"osm2pgsql Error: " f "\n"); \ } while (0) Jon _______________________________________________ dev mailing list [email protected] http://lists.openstreetmap.org/listinfo/dev

