Hi Metadata-Hackers,

I just wanted to catch up with the lates progress and also wanted to point 
to some questions that arised while we were working on thinks like RDF 
storage, metadata, and querying.

So, how is going? Any news concerning implementation, design or used 
libraries?

I put some effort in getting Sesame and Sesame2 [1] working under C# and it 
worked quite well. I was able to write into an Sesame native store (which is 
quite fast! see [2]). I created 100.000 simple triples and stored them into 
the local repository with
- 1,47k triples/second on my laptop @ 800 Mhz
- 1,93k triples/second on my laptop @ 2000 Mhz
- 7,14k triples/second on our server @ 2000 Mhz (Athlon 64bit 3800+)
Surprisingly, the memory consumption of the C# test program was lower than 
the Java version. But the Java program was faster :-(. Maybe IKVM can make 
some improvements on that.
The C# port even worked with an remote repository using the HTTP protocol 
described in section 8 of the Sesame documentation [3].

Now lets come to some technical resp. implementational questions:
How do you plan to integrate the rdf store into Beagle's architecture?
- Hard-coded like the Lucene indexes or dynamically linked like the Filters 
and the Queryables?
I could imagine an implementation where possible RDF stores share a common 
API (as all Filters do), and they are compiled against Beagle and stored in 
a specific folder where Beagle recognizes its presence. Via configuration 
the preferred RDF store can be selected. Therefore one could easily replace 
the RDF store with any kind of implementation: file-based, rdbms-based, 
remote server, different libraries as semweb, Jena, sesame, yars, kowari, 
...

How about the Ontology used within the store?
- Do the Filters have to comply to one?
- Does every filter have its own way to describe metadata?

How shall the metadata be queried?
- Full-text search on the attributes using the query keywords?
- special queries like "metadata:..."?
- what about paths of metadata like "document of author X received as 
attachment via email from Y" which matches
     document hasAuthor X
     document isAttachmentOf EMail
     EMail from Y

How are results ranked if they are found in the rdf store but not in the 
lucene index?
- how can these scores merged with lucene scores?

As you can see many questions may arise. We already work on many of these 
due to our research activities. Some of them should be addressed upfront 
(architectural and design issues), others, of course can be addressed when 
they emerge.

Hoping for interesting comments,
Enrico M.

[1] http://www.openrdf.org/
[2] http://tripletest.sourceforge.net/2005-06-08/index.html
[3] http://www.openrdf.org/doc/sesame/users/ch08.html


_______________________________________________
Dashboard-hackers mailing list
[email protected]
http://mail.gnome.org/mailman/listinfo/dashboard-hackers

Reply via email to