Re: [OSM-dev] Simpler binary OSM formats

Andrew Byrd Sun, 07 Feb 2016 11:02:20 -0800

Hi Dmitry,

Yes, there are similarities and I did study the o5m format before I began work 
on vex. The last section of my original article compares the two and gives my 
impressions of o5m: 
http://conveyal.com/blog/2015/04/27/osm-formats#comparisons-with-o5m 
<http://conveyal.com/blog/2015/04/27/osm-formats#comparisons-with-o5m>


In summary: o5m uses string tables with a fixed size and an LRU eviction 
policy. Producers and consumers must keep their string tables exactly in sync. 
Strings are then referenced by integers indicating how recently they were used 
(1 to 15000). This adds quite a bit of complexity to o5m implementations, 
especially considering that this eviction strategy can backfire on certain 
inputs leading to files that are actually bigger than a basic gzipped text 
representation of the same data. According to 
http://wiki.openstreetmap.org/wiki/Talk:O5m#Compression_Algorithms 
<http://wiki.openstreetmap.org/wiki/Talk:O5m#Compression_Algorithms> o5m uses 
string tables specifically to avoid relying on general purpose compression. I 
find this unnecessary considering that zlib compression is quite effective, 
resource efficient (with adjustable compression level), and available 
practically everywhere.

There are a few other unusual design decisions documented and discussed at 
http://wiki.openstreetmap.org/wiki/O5m <http://wiki.openstreetmap.org/wiki/O5m> 
and http://wiki.openstreetmap.org/wiki/Talk:O5m 
<http://wiki.openstreetmap.org/wiki/Talk:O5m>. For example, strings are both 
introduced and terminated by a null byte, and are often stored in pairs (i.e. 
three null bytes per string pair, one of which is at the beginning of the 
string).

Of course I recognize o5m's contribution to the dialog on binary formats, and 
we can of course learn from the o5m concept, but my conclusion is that it does 
not have the combination of extreme simplicity and compactness necessary to 
complement the existing formats.

As for fixed sized blocks in vex, I did consider that option but couldn’t come 
up with a compelling reason for it. I can see the case for a maximum block size 
(so we know what the maximum size of allocation will be), but can you give a 
concrete example of how fixed-size blocks would be advantageous in practice? I 
would be very hesitant to split any entity across multiple blocks.

-Andrew

> On 07 Feb 2016, at 09:06, Дмитрий Киселев <[email protected]> wrote:
> 
> Looks pretty similar to o5m, except tags key=value are not round-buffered.
> 
> As a further extension, it would be nice to have the ability to have blocks 
> of fixed size. 
> Just write nodes one by one while you haven't full-fill byte buffer.
> For extremely big relations (which are larger than one block) it's possible 
> to mark two adjacent blocks as connected, but there should be a few of them.
> 
> It would help to read write and seek over files.
> 
> 2016-02-07 3:47 GMT+05:00 Stadin, Benjamin 
> <[email protected] 
> <mailto:[email protected]>>:
> Hi Andrew,
> 
> Cap'n Proto (successor of ProtoBuffer from the guy who invented ProtoBuffer) 
> and FlatBuffers (another ProtoBuffer succesor, by Google) have gained a lot 
> of traction since last year. Both eliminate many if the shortcomings of the 
> original ProtoBuffer (allow for random access, streaming,...), and improve on 
> performance also.
> 
> https://github.com/google/flatbuffers <https://github.com/google/flatbuffers>
> 
> Here is a comparison between ProtoBuffer competitors:
> https://capnproto.org/news/2014-06-17-capnproto-flatbuffers-sbe.html 
> <https://capnproto.org/news/2014-06-17-capnproto-flatbuffers-sbe.html>
> 
> In my opinion FlatBuffers is the most interesting. It seems to have very good 
> language and platform support, and has quite a high adoption rate already. 
> 
> I think that it's well worth to reconsider creating an own file format and 
> parser for several reasons. Your concept looks well thought, it should be 
> possible to implement a lighweight parser using FlatBuffers for your data 
> scheme. 
> 
> Regards
> Ben 
> 
> Von meinem iPad gesendet
> 
> Am 06.02.2016 um 22:37 schrieb Andrew Byrd <[email protected] 
> <mailto:[email protected]>>:
> 
>> Hello OSM developers,
>> 
>> Last spring I posted an article discussing some shortcomings of the PBF 
>> format and proposing a simpler binary OSM interchange format called VEX. 
>> There was a generally positive response at the time, including helpful 
>> feedback from other developers. Since then I have revised the VEX 
>> specification as well as our implementation, and Conveyal has been using 
>> this format in our own day-to-day work.
>> 
>> I have written a new article describing of the revised format:
>> http://conveyal.com/blog/2016/02/06/vex-format-part-two 
>> <http://conveyal.com/blog/2016/02/06/vex-format-part-two>
>> 
>> The main differences are 1) it is more regular and even simpler to parse; 
>> and 2) file blocks are compressed individually, allowing parallel processing 
>> and seeking to specific entity types. It is no longer smaller than PBF, but 
>> still comparable in size.
>> 
>> Again, I would welcome any comments you may have on the revised format and 
>> the potential for a shift to simpler binary OSM formats.
>> 
>> Regards,
>> Andrew Byrd
>> 
>> 
>>> On 29 Apr 2015, at 01:35, andrew byrd <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> Hello OSM developers,
>>>  
>>> Over the last few years I have worked on several pieces of software that 
>>> consume and produce the PBF format. I have always appreciated the 
>>> advantages of PBF over XML for our use cases, but over time it became 
>>> apparent to me that PBF is significantly more complex than would be 
>>> necessary to meet its objectives of speed and compactness.
>>>  
>>> Based on my observations about the effectiveness of various techniques used 
>>> in PBF and other formats, I devised an alternative OSM representation that 
>>> is consistently about 8% smaller than PBF but substantially simpler to 
>>> encode and decode. This work is presented in an article at 
>>> http://conveyal.com/blog/2015/04/27/osm-formats/ 
>>> <http://conveyal.com/blog/2015/04/27/osm-formats/>. I welcome any comments 
>>> you may have on this article or on the potential for a shift to simpler 
>>> binary OSM formats.
>>>  
>>> Regards,
>>> Andrew Byrd
>>> _______________________________________________
>>> dev mailing list
>>> [email protected] <mailto:[email protected]>
>>> https://lists.openstreetmap.org/listinfo/dev 
>>> <https://lists.openstreetmap.org/listinfo/dev>
>> 
>> _______________________________________________
>> dev mailing list
>> [email protected] <mailto:[email protected]>
>> https://lists.openstreetmap.org/listinfo/dev 
>> <https://lists.openstreetmap.org/listinfo/dev>
> 
> _______________________________________________
> dev mailing list
> [email protected] <mailto:[email protected]>
> https://lists.openstreetmap.org/listinfo/dev 
> <https://lists.openstreetmap.org/listinfo/dev>
> 
> 
> 
> 
> -- 
> Thank you for your time. Best regards.
> Dmitry.

_______________________________________________
dev mailing list
[email protected]
https://lists.openstreetmap.org/listinfo/dev

Re: [OSM-dev] Simpler binary OSM formats

Reply via email to