Re: [CODE4LIB] Seeking feedback on database design for an open source software registry

Peter Murray Wed, 17 Aug 2011 12:53:01 -0700

(Back from vacation now.  Thanks again for everyone's thoughts and suggestions.)


On Aug 9, 2011, at 7:34 PM, Jonathan Rochkind wrote:
> Just to play Simplicity Devil's Advocate, and admittedly not having followed 
> this whole thread or your whole design. 
> 
> What if the model was nothing but two entities:
> 
> Software
> Person/Group (Yes, used either for an individual or a group of any sort). 
> 
> With a directed 'related' relationship between each entity and reflexive. 
> (Software -> Person/Group ; Software -> Software; Person/Group -> Software ; 
> Person/Group-> Person/Group ). 
> 
> That 'related' relationship can be annotated with a relationship type from a 
> controlled vocabulary, as well as free-entered user tags.  Controlled 
> vocabulary would include Person/Group *uses* Software;  Person/Group 
> *develops* Software;  Software *component of* Software.  Person/Group *member 
> of* Person/Group.  
> 
> People could enter 'tags' on the relationship for anything else they wanted. 
> You could develop the controlled vocabulary further organically as you get 
> more data and see what's actually needed -- and what people free tag, if they 
> do so.  
> 
> Additional attributes are likely  needed on Software; probably not too many 
> more on Person/Group.  But to encourage 'crowd source', you can enter a 
> Software without filling out all of those attributes, it's as easy as filling 
> out a simple form, and if you want making a couple relationships to other 
> Software or Person/Group, or those can be made later by other people, if it 
> catches on and people actually edit this. 
> 
> Things like URLs to software (or people!) home pages can really just be 
> entered in a big free text field -- using wiki syntax, or better yet, 
> Markdown. 
> 
> I think if the success of the project depends on volunteer crowd sourcing, 
> you've got to keep things simple and make it as easy as possible to enter 
> data in seconds. Really, even without the entering, keeping it simple will 
> lead you to a simple interface, which will be useable and more likely to 
> catch on. 


Interesting model.  I'd like to think this through a little more; my first 
thoughts are that while it might make the user interface and the data model 
simpler, enforcement of consistency of the data itself would diminish, which 
might cause a hodgepodge data that would be difficult to page through.  
Simplicity in data entry might be sacrificed for simplicity in search/browse.


On Aug 9, 2011, at 7:50 PM, stuart yeates wrote:
> You may also be interested in the (older?) work at 
> http://projects.apache.org/ and http://trac.usefulinc.com/doap For example:
> 
> http://projects.apache.org/projects/xindice.html /
> http://svn.apache.org/repos/asf/xml/xindice/trunk/doap_Xindice.rdf
> 
> Interoperability with RDF/DOAP lets you build on others work and lets 
> others in turn pick your work over.
> 
> At the very least if allows you to get suck in the latest and greatest 
> releases automatically.

Ah, yes!  That is the sort of linked data interoperability I was thinking would 
be possible.  Thanks for the pointers to those efforts.


On Aug 9, 2011, at 8:23 PM, Matt Jones wrote:
> On Tue, Aug 9, 2011 at 3:50 PM, stuart yeates <stuart.yea...@vuw.ac.nz>wrote:
>> ...
>> Ohloh is great. However it relies almost completely on metrics which are
>> easily gamed by the technically competent. Use of these kinds of metrics in
>> ways which encouraging gaming will only be productive in the short term,
>> perhaps the very short term.
>> 
>> For example: it's easy to set up dummy version control accounts and there
>> can be good technical reasons for doing so. It's easy to set up a build/test
>> suite to update a file in the version control after it's daily run and there
>> can be good technical reasons for doing so. But doing these things can also
>> transform a very-low activity single user project into a high-activity dual
>> user project, in the eyes of ohloh.
>> 
>> Turning on template-derived comments in the next big migration handles the
>> "is the code commented?" metric.
>> 
>> The more metrics are used, the more motivation there is to use tools (which
>> admittedly have other motivations) which make a project look good.
>> 
> I agree the ohloh metrics are easily gamed.  What metrics do you recommend
> that can't be gamed but still provide a synopsis of the project for
> evaluation, comparison, and selection? I think there is some utility even
> though they can be gamed.  The metrics are not a substitute for critical
> evaluation, but provide a nice synopsis as a jumping off point.  For
> example, if I am interested in projects that have a demonstrable lifespan >
> 5 years, and that have had more than 10 developers contribute, I can find
> that via these metrics.  I can then assess for myself if any of the
> resulting projects are false positives (e.g., the commit log will give some
> idea of the types of commits made by each person).
> 
> If you're concerned about the system being gamed via metrics, then you
> should also be concerned about user-submitted project descriptions.
> Projects have a tendency to over-generalize on what their software does,
> under-report defects, and generally paint a rosy picture.  Will there be
> some sort of quality control/editing/verification of the claims made by
> submitters? Will it matter if some of the projects are described more
> generously than in reality?  Won't the system still be useful even if they
> are?


I'm interested to hear more about what others think would be good metrics.  I 
agree with Matt that they serve as a useful rough sorting mechanism (perhaps as 
a way to cull projects which clearly have no active community, or at least not 
one that is actively gaming the metrics -- but even gaming shows some activity, 
doesn't it?).  


Peter
-- 
Peter Murray         peter.mur...@lyrasis.org        tel:+1-678-235-2955        
         
Ass't Director, Technology Services Development   http://dltj.org/about/
LYRASIS   --    Great Libraries. Strong Communities. Innovative Answers.
The Disruptive Library Technology Jester                http://dltj.org/ 
Attrib-Noncomm-Share   http://creativecommons.org/licenses/by-nc-sa/2.5/

Re: [CODE4LIB] Seeking feedback on database design for an open source software registry

Reply via email to