Forgot to do "Reply to All". On Wed, Jul 14, 2010 at 4:59 PM, Vishesh Handa <[email protected]> wrote:
> Hey Sebastian > > On Wed, Jul 14, 2010 at 2:59 PM, Sebastian Trüg <[email protected]> wrote: > >> Hi Vishesh, >> >> great work. Some comments: >> >> On 07/13/2010 10:21 PM, Vishesh Handa wrote: >> > The current patch checks for blank Nodes in the object / subject of the >> > file's metadata, and tries to find them or creates them if not present. >> > The patch reverts to a the original behavior if any of the additionally >> > generated metadata ( contacts, albums) contain any blank nodes. in order >> > to fix this, a full blown dependency resolution algorithm would be >> > required. I don't think that it is currently required. >> >> Could you elaborate on this please. I do not follow exactly. The problem >> would be if there was for example a contact that was related to another >> contact? If so, I doubt any strigi analyzer does create such data atm. >> >> Yes, the problem would arise if one contact is related to another. AFAIK > no strigi analyzer creates such data, and even if one of them does create > such data. It would land up creating the resources which is what the earlier > IndexWriter did. > > >> > The patch also creates a different graph ( discardable ) for each >> > individual resource. >> >> very good. >> >> > *Problem not fixed :* >> > This will only work on newly indexed files and does not affect the files >> > which have already been indexed. We'll need some kind of merger to do >> > that. It's a lot simpler to just re-index the files, but I don't think >> > the end users would like that. >> >> I am still for reindexing files based on mimetype whenever there are new >> features available for that mimetype. >> > > Same here, but I doubt the users would be. I was trying to implement > resource merging yesterday and I don't think it is difficult, the only > problem is that I couldn't find any fast way to check if resources need to > be merged. ( Maybe in 1-2 queries, any idea? ) The actual merging would be > somewhat slow, but that's okay. It would be done in a separate thread and > would only be done once or twice. > > >> >> > *A New Problem : >> > *Since the additional metadata now has it's own graph. It will not be >> > deleted if the file is deleted. We need some kind of cleaner which >> > cleans resources which are no longer in use. >> >> I see 2 solutions: >> 1. Do it in a cleaner service the same way we clean up unused metadata >> graphs and the like. >> 2. Do it when deleting the files' metadata in the strigi service and in >> the filewatch service using shared code. >> > > The second option would be easier, but I'm not too keen on slowing down the > metadatamover service or the strigi service. They have enough to deal with > as it is. For now, we can just let the additional resources be. > >> >> >> And now the code review: >> 1. Please add comments on all methods as if it were public API. While >> this is not required and I do not do it all the time it will simplify >> working on the code together. >> > > Okay > > >> 2. the variable names are not very descriptive: m_queue and m_stack. :P >> > > I've renamed m_queue to m_updateQueue, but I'm not sure what I should do > with m_stack. I could maybe call it feeder stack? Do you have suggestions? I > have however, add comments that say what the variable does. > > >> 3. I think we could do most of generateGraph using Soprano::NRLModel >> 4. createNode() is not descriptive enough. It is not clear what it does >> when you see it used in code. Better call it something like >> blankNodeFromIdentifier or something like that. >> > > I've renamed it to createBlankOrResourceNode() as that is what it does. > > >> 5. the feeder is started twice (from a conceptual point of view): once >> after constructing it and once in startAnalysis. IMHO you could simply >> start the feeder in its constructor and thus, hide the threading issue >> away more. >> > > Done. :) > > >> 6. "// Handle the feeder" is not the greatest comment as it only >> clutters the code without adding any new information. The fact that >> something is being done with the feeder is obvious. :) >> > > Uhh. Yea, stupid comment. I've removed it for now. > > >> 7. the name StrigiFeeder is a bit misleading. The class does not feed >> anything into Strigi. It does the exact opposite. So maybe NepomukFeeder >> would be better... i dont know. >> > > I found the name NepomukFeeder rather generic. I've named it to > NepomukIndexFeeder for now. :-/ > > >> 8. You did it again: a static method named "toSparql". Please do not do >> that. :) >> > > But this time I added documentation as to what kind of query it creates. I > understand that these kind of function names are bad, but I can't think of > any alternative. Could you please suggest some name? > > >> 9. Why don't you store the set of statements as ResourceHash right away? >> >> Yes. I thought about this for quite some time. I actually made a list of > pros and cons. In the end I decided to go with this approach as it required > less modification of code ( less error prone ) and is technically more > portable. Plus, ResourceStruct is a lousy name and I didn't really want to > make the IndexWriter depend on it. > > But I think we should think about standardizing ResourceStruct. I use it a > lot in BackupSync and I use it in the API Artem wants ( which is difficult > to design ). Plus it's really convenient! > > >> As you can see I have no "real" comments since IMHO you did a great job. >> So please go ahead and commit that (maybe with some changes based on my >> comments) to trunk. Then testing can commence. :) >> > > Are you sure? Just say "Yes' and I'll commit it. > >> >> Cheers, >> Sebastian >> _______________________________________________ >> Nepomuk mailing list >> [email protected] >> https://mail.kde.org/mailman/listinfo/nepomuk >> > >
_______________________________________________ Nepomuk mailing list [email protected] https://mail.kde.org/mailman/listinfo/nepomuk
