> > As for fixed sized blocks in vex, I did consider that option but couldn’t > come up with a compelling reason for it. I can see the case for a maximum > block size (so we know what the maximum size of allocation will be), but > can you give a concrete example of how fixed-size blocks would be > advantageous in practice? I would be very hesitant to split any entity > across multiple blocks. >
When you need relations-ways-nodes read order, blocks will save you from unnecessary read-through the whole file (yes, you can skip decompression for nodes/ways but still you must read the whole file). Second example: find something by id, if you have blocks it's easy to map whole block into memory and do a binary search for logN block reads instead of seeing through a file all the time. 2016-02-07 23:59 GMT+05:00 Andrew Byrd <[email protected]>: > Hi Dmitry, > > Yes, there are similarities and I did study the o5m format before I began > work on vex. The last section of my original article compares the two and > gives my impressions of o5m: > http://conveyal.com/blog/2015/04/27/osm-formats#comparisons-with-o5m > > In summary: o5m uses string tables with a fixed size and an LRU eviction > policy. Producers and consumers must keep their string tables exactly in > sync. Strings are then referenced by integers indicating how recently they > were used (1 to 15000). This adds quite a bit of complexity to o5m > implementations, especially considering that this eviction strategy can > backfire on certain inputs leading to files that are actually bigger than a > basic gzipped text representation of the same data. According to > http://wiki.openstreetmap.org/wiki/Talk:O5m#Compression_Algorithms o5m > uses string tables specifically to avoid relying on general purpose > compression. I find this unnecessary considering that zlib compression is > quite effective, resource efficient (with adjustable compression level), > and available practically everywhere. > > There are a few other unusual design decisions documented and discussed at > http://wiki.openstreetmap.org/wiki/O5m and > http://wiki.openstreetmap.org/wiki/Talk:O5m. For example, strings are > both introduced and terminated by a null byte, and are often stored in > pairs (i.e. three null bytes per string pair, one of which is at the > beginning of the string). > > Of course I recognize o5m's contribution to the dialog on binary formats, > and we can of course learn from the o5m concept, but my conclusion is that > it does not have the combination of extreme simplicity and compactness > necessary to complement the existing formats. > > As for fixed sized blocks in vex, I did consider that option but couldn’t > come up with a compelling reason for it. I can see the case for a maximum > block size (so we know what the maximum size of allocation will be), but > can you give a concrete example of how fixed-size blocks would be > advantageous in practice? I would be very hesitant to split any entity > across multiple blocks. > > -Andrew > > On 07 Feb 2016, at 09:06, Дмитрий Киселев <[email protected]> > wrote: > > Looks pretty similar to o5m, except tags key=value are not round-buffered. > > As a further extension, it would be nice to have the ability to have > blocks of fixed size. > Just write nodes one by one while you haven't full-fill byte buffer. > For extremely big relations (which are larger than one block) it's > possible to mark two adjacent blocks as connected, but there should be a > few of them. > > It would help to read write and seek over files. > > 2016-02-07 3:47 GMT+05:00 Stadin, Benjamin < > [email protected]>: > >> Hi Andrew, >> >> Cap'n Proto (successor of ProtoBuffer from the guy who invented >> ProtoBuffer) and FlatBuffers (another ProtoBuffer succesor, by Google) have >> gained a lot of traction since last year. Both eliminate many if the >> shortcomings of the original ProtoBuffer (allow for random access, >> streaming,...), and improve on performance also. >> >> https://github.com/google/flatbuffers >> >> Here is a comparison between ProtoBuffer competitors: >> https://capnproto.org/news/2014-06-17-capnproto-flatbuffers-sbe.html >> >> In my opinion FlatBuffers is the most interesting. It seems to have very >> good language and platform support, and has quite a high adoption rate >> already. >> >> I think that it's well worth to reconsider creating an own file format >> and parser for several reasons. Your concept looks well thought, it should >> be possible to implement a lighweight parser using FlatBuffers for your >> data scheme. >> >> Regards >> Ben >> >> Von meinem iPad gesendet >> >> Am 06.02.2016 um 22:37 schrieb Andrew Byrd <[email protected]>: >> >> Hello OSM developers, >> >> Last spring I posted an article discussing some shortcomings of the PBF >> format and proposing a simpler binary OSM interchange format called VEX. >> There was a generally positive response at the time, including helpful >> feedback from other developers. Since then I have revised the VEX >> specification as well as our implementation, and Conveyal has been using >> this format in our own day-to-day work. >> >> I have written a new article describing of the revised format: >> http://conveyal.com/blog/2016/02/06/vex-format-part-two >> >> The main differences are 1) it is more regular and even simpler to parse; >> and 2) file blocks are compressed individually, allowing parallel >> processing and seeking to specific entity types. It is no longer smaller >> than PBF, but still comparable in size. >> >> Again, I would welcome any comments you may have on the revised format >> and the potential for a shift to simpler binary OSM formats. >> >> Regards, >> Andrew Byrd >> >> >> On 29 Apr 2015, at 01:35, andrew byrd <[email protected]> wrote: >> >> Hello OSM developers, >> >> Over the last few years I have worked on several pieces of software that >> consume and produce the PBF format. I have always appreciated the >> advantages of PBF over XML for our use cases, but over time it became >> apparent to me that PBF is significantly more complex than would be >> necessary to meet its objectives of speed and compactness. >> >> Based on my observations about the effectiveness of various techniques >> used in PBF and other formats, I devised an alternative OSM representation >> that is consistently about 8% smaller than PBF but substantially simpler to >> encode and decode. This work is presented in an article at >> http://conveyal.com/blog/2015/04/27/osm-formats/. I welcome any comments >> you may have on this article or on the potential for a shift to simpler >> binary OSM formats. >> >> Regards, >> Andrew Byrd >> _______________________________________________ >> dev mailing list >> [email protected] >> https://lists.openstreetmap.org/listinfo/dev >> >> >> _______________________________________________ >> dev mailing list >> [email protected] >> https://lists.openstreetmap.org/listinfo/dev >> >> >> _______________________________________________ >> dev mailing list >> [email protected] >> https://lists.openstreetmap.org/listinfo/dev >> >> > > > -- > Thank you for your time. Best regards. > Dmitry. > > > -- Thank you for your time. Best regards. Dmitry.
_______________________________________________ dev mailing list [email protected] https://lists.openstreetmap.org/listinfo/dev

