Re: [cc-devel] License Metadata Extraction and Search, Summer of Code

Luke Hoersten Wed, 21 Mar 2007 18:02:35 -0800

That sounds like a good plan. Calling external libraries will
definately make programming faster (which right now is more important
than execution speed).


Luke

On 3/21/07, Jason Kivlighn <[EMAIL PROTECTED]> wrote:
> I think I've settled on Tracker.  I got an okay from them as well as
> someone who volunteered to mentor me with Tracker code while working
> under Creative Commons.
>
> I like the idea of separating it into two parts.  Since there's so many
> indexers out there, separating the parser means we have an
> application/library that any indexer can use.  Looking at Tracker's
> infrastructure it should work nicely.  Even using Tracker, cc-sharp may
> come in handy, since Tracker can call external processes to extract the
> search data.  Here's the list of formats I was hoping to support: MP3,
> OGG, RSS, SVG, HTML, XML, JPEG,  PDF, SMIL.  The big problem I see with
> cc-sharp is working with C#.  I'd consider myself fairly fluent in
> C,C++,Java, and Python.
>
> I notice that ccPublisher already attaches licenses, and ccLookup reads
> licenses in anything with RDF metadata as well as in mp3s.  In response
> to your second email, Luke, it might work to extend ccLookup to support
> more formats and then have the Tracker extractor call this program.
> Then I'm sticking with a  high-level language I'm familiar with.
> However, I'm not sure if that will bode well for performance, though.
> The extraction process needs to be fast, so a C library might be a
> better option.  Given the scope of formats, our extractor would be run
> quite often for the typical desktop.
>
> The Tracker code base from what I've seen looks very manageable, but I
> hope to get more feedback from the Tracker folks soon.
>
> Cheers,
> Jason
> > Jason,
> > I did something similar to this last year for SoC and it resulted in a
> > new CC library called cc-sharp:
> > http://code.google.com/p/cc-sharp/
> >
> > So your project could have two parts: the 1) license handling and then
> > 2) integrating that data with the desktop search application. If you
> > wanted to use C# (Beagle), I'd help flesh out cc-sharp with you and
> > you could work on the integration.
> >
> > The other C# CC lib around is CCLicenseLib which hasn't been developed
> > in four years.
> > http://workspaces.gotdotnet.com/cclib
> >
> > It contains object representations of the older CC licenses. It would
> > be nice to make one condensed lib for CC stuff in C# so developers for
> > other projects could easily integrate with their software. I see it
> > being laid out as such:
> >
> > - Attaching licenses to media
> > - Reading licenses from meda
> > - Verifying licenses
> >
> > This desktop search idea would primarily use reading and verifying.
> > Right now all cc-sharp does is verify because I was originally working
> > on Banshee. Banshee already had read the metadata from the MP3 via my
> > patch so all my lib really was, was an abstraction of the
> > verification. Since verification is done over the Internet, that's not
> > really something you want to include by default in core application
> > code.
> >
> > I'd like to abstract license reading so we can just "plug" support for
> > different file types to be read whether they are images, audio, etc.
> > Kind of like vfs.
> >
> > What are your thoughts?
> >
> > -Luke
> >
> > On 3/21/07, Jason K <[EMAIL PROTECTED]> wrote:
> >
> >> Hi,
> >>
> >> I'm looking into adding support for searching/indexing licenses for a
> >> service such as Tracker, Beagle, or Strigi for a Google SoC project.  My
> >> first hurdle though, is picking which indexer.  The ideal service would
> >> be cross-desktop, to avoid implementing extraction filters over and over
> >> again for different indexers.  It also needs to be widely adopted.
> >>
> >> Tracker is looking like a good candidate, given that it is a
> >> Freedesktop.org project, is desktop-neutral, and appears to have the
> >> intention of following standards as well as creating standards for other
> >> search services to use.  I get the impression GNOME will be including
> >> this soon.
> >>
> >> Strigi is also desktop-neutral, though favored by KDE and is going to be
> >> used by KDE 4.  It doesn't rely on KDE, though.  In fact, Strigi's only
> >> requirements are are the stdc++ libraries, while Tracker is glib-based.
> >>
> >> And for Beagle, Mono is one significant reason I'm shying away from it.
> >> Tracker or Strigi appear more interoperable and look to be getting wider
> >> adoption.
> >>
> >> Formats I plan to include are:
> >>   HTML, SVG, SMIL, XML in general (RDF)
> >>   PDF, JPEG, other images (XMP)
> >>   MP3, OGG, other audio/video
> >>   RSS
> >>
> >> >From what I've seen, most license data is either in RDF or XMP form.
> >> MP3, OGG, and RSS are exceptions.  For all these formats, I would follow
> >> the embedding specification on the Creative Commons website, at
> >> http://creativecommons.org/technology/usingmarkup
> >>
> >> Since most licenses are placed in RDF or XMP, that code can be separated
> >> and reused from various extraction modules.
> >>
> >> So enough rambling... thoughts?
> >>
> >> -Jason Kivlighn
> >> _______________________________________________
> >> cc-devel mailing list
> >> [email protected]
> >> http://lists.ibiblio.org/mailman/listinfo/cc-devel
> >>
> >>
> >
> >
> >
>
>


--
Luke Hoersten
http://www.cs.purdue.edu/homes/lhoerste/
http://openradix.org/


-- 
Luke Hoersten
http://www.cs.purdue.edu/homes/lhoerste/
http://openradix.org/
_______________________________________________
cc-devel mailing list
[email protected]
http://lists.ibiblio.org/mailman/listinfo/cc-devel

Re: [cc-devel] License Metadata Extraction and Search, Summer of Code

Reply via email to