On 7/22/64 12:59 PM, Frederik Ramm wrote:
Kai,

  partial answer:

On 09/13/2011 02:07 AM, Kai Krueger wrote:
2) Currently all the (diff-) import is done in a single transaction.
Therefore other db users (e.g. renderers) don't see any change until the
full transaction is committed. In order to do things in parallel,
however, there needs to be intermediary commits

[...]

The question though is this valid? For the initial import this is
probably not a problem as there won't be any db users concurrently until
the import is complete. However, diff imports with concurrent rendering
is a different matter. What will committing pending ways do to rendering?

Renderers use the geometry tables; the "pending" way is in the data table where it will not usually be touched by renderers. So I don't see a problem here. I am however not familiar with internal Postgres processing and I could imagine that there is a speed penalty in commiting pending ways as opposed to resetting the pending flag in the same transaction where it was set.

Good point. Yes the pending way stuff is on the ways table and not on the geometry rendering tables, so hopefully it shouldn't cause any direct breakage of the rendering. What possibly could happen is that you get some temporal inconsistencies, in the sense that on a single tile you might have some newer ways rendered but older polygons not showing up yet. But that should hopefully not really cause any problems.


3) Currently the string cache is not thread safe. It is possible to
disable it via a single preprocessor define and then parallelizing at
least doesn't lead to crashes, but I assume it is there for a good
reason. Presumably with a bit of work, it should be possible to get the
string cache thread safe though as well. So assuming the other two
points aren't show stoppers, this should be possible to fix.

Have you considered multiprocessing (i.e. fork) instead of multithreading - would this perhaps make these things go away elegantly? Personally I abhor multithreading for the complexity it brings at (usually) little gain compared to simply forking a few worker processes but of course YMMV especially if you want tight communication between workers.
No, I hadn't considered multiprocessing, but again, that is a good point worth exploring further. Currently, what I have done does have a tight integration to share to loop counter between threads, but you can probably just split it into independent sections per worker process.

Overall, it does hopefully mean that it is worth exploring this avenue further though, and try and get a clean enough patch to consider applying it to osm2pgsql.

Kai


Bye
Frederik



_______________________________________________
dev mailing list
[email protected]
http://lists.openstreetmap.org/listinfo/dev

Reply via email to