This is the part of the log of the conversation about metadata syncing between me and Vishesh. The discussion is about 2 types of primary keys for syncing resources between different sources.
All comments are welcome <--- some more messages> You have 2 models m1 & m2. You create a primary key for a resource r1 from m1. The primary key consists of all the identifying properties. Then you try to find a similar resource in r2. 3:33 PM you call a function like FindMatch( primary key, m2 ) 3:34 PM when creating the query it discovers that one of the objects is a resourcce uri whose identifying properties it requires. So it needs to ask m1 for that resources identifying properties as well, but it has no knowledge of m1 3:35 PM The Solution is for the Primary key to contains identifying properties of the resource in question and the identifying properties of all other resources it is connected it. I've been a little reluctant to do that. But I think there is no other solution 3:36 PM Do you get what I mean? 3:38 PM me: I think yes. But I am not sure. Why do not add source model as parameter to find match ? *findMatch *FindMatch 3:39 PM Then you can use simple recursive algorithm Vishesh: Yea. That's the other solution. But that doesn't work with backupsync cause the other model is on another system. 3:40 PM Plus the whole idea of the Primary Key was that once it has been created, it becomes model independent 3:44 PM me: 2 types of primary keys ? BoundPrimaryKey - key that require pointer to model, and UnboundPrimaryKey - totaly independend one, that is searilization of main resource identifying properties, resources that the main one connected to, etc ? The unbound one looks more like serialization of subset on rdf model. *subset *of* Vishesh: yea it is. 3:45 PM It is a serialization represented in a compact form. If we have 2 kind of keys, that would mean addition functions for matching both kind of keys = More code me: Yes. So if you have access to model, you use BoundPrimaryKey. If you have not, you use UnboundPrimaryKey 3:46 PM Not exactly. Vishesh: In theory yes. But then we have to maintain separate functions for each key which don't have many things related. 3:47 PM me: I was going to ask trueg about in-memory soprano::backend. If there is one, then you can just deserialize UnbondPK to model, and convert it to BoundPK and call FindMatch with BoundPK, temporal model *(BoundPK, temporal model) 3:48 PM Vishesh: There is an in-memory Soprano::Backend we use it while loading the ontologies. Check out the Ontology loader class, if you're interested It has been moved to services/nepomukstorage 3:49 PM me: Oh, thanks! Vishesh: I'm still not convinced that having 2 kinds of keys is the right approach Only unbounded keys might be better, but then they would be huge. 3:50 PM me: I am not convinced too. We a just discussing and trying to find a good solution. Vishesh: Yes. It's good we're discussing it. 3:51 PM me: Yes. In some cases it will be equal to the size of all rdf storage. That's why I think that BoundKeys are better. 3:53 PM Vishesh: But when we are trying to sync it ( or identify it ) we would need all that data So, it's just a question of getting it in one go or slowly by querying multiple times 3:54 PM me: quering multiple times will be faster. 3:55 PM Caches will start working Squid(may be) in case of syncing with Internet accesable database. Vishesh: Yes. But I need to have all the data for BackupSync, otherwise we will land up duplicating code from backkupsync that can't really be merged. me: etc. 3:56 PM Yes. I see. 3:57 PM wait pls. I am no so sure. 3:58 PM M1 and M2 are 2 model *models and we have synced them at 00:00:00 14 Wed. 3:59 PM Now I add a new Resource to M1. Thes resource is connected to some other resources, and so on. Let this resource be as complicated, so it's UnboundPK is big. 4:00 PM Now we start syncing. 1) Syncing with UnboundPK * Create UnboundPK - it is big * Send this UnboundPK through network 4:01 PM ** network is bluetooth and we are in the outer space. So connection is slow. * Recive this UnboundPK 4:02 PM * Unpack it to the local model [optional, may be some other way of syncing with help of UnboundPK] * Sync * Profit Properties: 4:03 PM A lot of data to send and a lot of memory to store unpacked one Vishesh: yea me: 2) Syncing with BoundPK Vishesh: I see what you mean me: * use iterative algo ** I mean recursive Vishesh: Okay. Stop me: sorry Vishesh: sorry? I get what you're saying but what if the user doesn't have access to the other model once the key has been created 4:04 PM which is the case with backups 4:05 PM me: Then he should use UnboundPK ? May be I understand you question wrong, doesn't I ? 4:06 PM Vishesh: Uhh. A little bit. I get that in some cases Unbound is better than bound and vice verse ( the opposite ) but if we support both we have a large amount of code duplication. 4:07 PM which is something I'm not too fond of. 4:08 PM me: I doesn't know internals of you service well enough. But why converting UnboundPK to pair <inmemory model, BoundPK> is bad idea ? 4:10 PM Vishesh: Hmm I might be able to simply convert it.. yea. I don't have to do it in process. I didn't think of that You're right 4:12 PM me: I think that you( or me, as you want) should send copy of this discussion to trueg. Or may be to mailing list. May be both of us are missing something important. -- Sincerely yours, Artem
_______________________________________________ Nepomuk mailing list [email protected] https://mail.kde.org/mailman/listinfo/nepomuk
