Hi Andrus,

I'm continuing this on the dev@ list if you don't mind?

> Am 08.03.2017 um 20:13 schrieb Andrus Adamchik <and...@objectstyle.org>:
> 
>> It would be nice if Cayenne would internally parallelize things like 
>> ObjectResolver.objectsFromDataRows() and use lock-free strategies to deal 
>> with the caching.
> 
> This is probably the last (and consequently the worst) place in Cayenne where 
> locking still occurs. After I encountered this problem in a high-concurrency 
> system, I've done some analysis of it (see [1] and also [2]), and this has 
> been my "Cayenne 5.0" plan for a long time. With 4.0 making such progress as 
> it does now, we may actually start contemplating it again.
> 
> Andrus
> 
> 
> [1] 
> https://lists.apache.org/thread.html/b3a990f94a8db3818c7f12eb433a8fef89d5e0afee653def11da1aa9@1382717376@%3Cdev.cayenne.apache.org%3E
> [2] 
> https://lists.apache.org/thread.html/bfcf79ffa521e402d080e3aafc5f0444fa0ab7d09045ec3092aee6c2@1382706785@%3Cdev.cayenne.apache.org%3E

Interesting read!

Regarding the array-based DataObject concept, wouldn't this mean for name-based 
attribute lookups that you still need a map somewhere that translates names to 
indexes? That map would only be needed once per entity, however.

Instead of the array-based approach, did you also consider ConcurrentHashMap 
and similar classes in java.util.concurrent? It would not have all the other 
advantages besides concurrency, but could perhaps serve as an easy intermediate 
step to get rid of the locking, and be implemented even in 4.0 already.

And on the [1] discussion, I'd like to mention my use case again: big queries 
with lots of prefetches to suck in gigabytes of data for aggregate computations 
using DataObject business logic. During those fetches, other users expect to be 
able to continue their regular workload concurrently (which they mostly cannot 
using EOF: my main reason to switch). So however this [1] concept turns out, 
I'd like to also be able to parallelize the fetches themselves. A useful first 
step would be to execute disjoint prefetches in separate threads.

A second step could be to have even a single big table scan query parallelized 
by partioning. Databases have been able to organize large tables into 
partitions that can be scanned independently from each other. Back in the days 
with Oracle and slower spinning disks you would spread partitions between 
independent disks, while today with SSDs and zero seek time that could still 
help to increase the throughput when CPU is the limiting factor (databases also 
tend to generate high CPU loads when doing full table scans, but only on one 
core per scan). An idea could be to include a partitioning criterium in the 
model, which matches the database's criterium for the table in question.

In the meantime I could try partitioning the queries on the application level, 
which can also work, but I'm back at the Graph Manager locking problem when 
merging them into one context for processing.

Today's hardware with databases on SSDs that can deliver 3 GByte/s or more, and 
16+ cores for processing calls for parallelization on every level.

Maik

Reply via email to