To follow on the full-text vs. structural index question:

It seems to me, from what you've said, that you have a relatively flat kind of 
metadata here. The fact that the names of the fields involved are RDF 
predicates doesn't necessarily mean that RDF indexing (such as is supplied by 
the Resource Index) is actually the best tool for the job. In my experience, 
RDF indexing is the tool you want to reach for when the metadata in question 
and the queries you expect to do across it are truly structured. From your 
example, that doesn't appear to be the case. If it's not the case (if your 
metadata is basically a flat set of simple-valued fields) a good full-text 
index and queries written to it are going to beat the pants off of most RDF 
indexes with respect to speed. 

Do you have examples of structured queries you expect to perform across this 
metadata? 

---
A. Soroka
Online Library Environment
the University of Virginia Library




On Nov 23, 2011, at 1:56 PM, Stephen Bayliss wrote:

> A full text index would help I think also.
> 
> Worth noting that FILTER will (as far as I know) take place *after* the
> results have been retrieved.
> 
> Steve
> 
>> -----Original Message-----
>> From: aj...@virginia.edu [mailto:aj...@virginia.edu] 
>> Sent: 23 November 2011 16:52
>> To: fedora-commons-developers@lists.sourceforge.net Developers
>> Subject: Re: [fcrepo-dev] Non Dublin Core data in DB
>> 
>> 
>> Are you using the default Mulgara triplestore configuration?
>> 
>> If the multiple objects in your SPARQL query are, as I 
>> believe you wrote, not actually resources but instead simple 
>> strings, have you considered using a full-text index for this 
>> kind of search? It would seem to be a good fit for Lucene's 
>> faceting abilities or a similar functionality.
>> 
>> ---
>> A. Soroka
>> Online Library Environment
>> the University of Virginia Library
>> 
>> 
>> 
>> 
>> On Nov 23, 2011, at 11:47 AM, J.T.P. wrote:
>> 
>>> Reason for my investigation is for performance issues.  I am using 
>>> SPARQL retrieving 20 objects (string values, 20 triples in my where 
>>> clause ) with about 1000  fedora objects in the datastore.  It take 
>>> about 18 seconds for retrieval.  My sparql  query is in the 
>> format of
>>> 
>>> select * where{
>>> ?subject <namespace:object> ?object
>>> ?subject <namespace:object_1> ?object_1
>>> .
>>> .
>>> .
>>> ?subject <namespace:object_20> ?object_20 FILTER(REGEX(?object, 
>>> "stringValue","i") }
>>> Any info would be most conducive. 
>>> 
>>> Very Respectfully,
>>> J.Pitts
>>> 
>>> 
>> **********************************************************************
>>> ***************
>>> "Inveniam viam aut faciam -- “I will find a way or make one.”
>>> 
>> **************************************************************
>> *********************
>>> 
>>> From: Alexis Miara <alexis.mi...@licef.ca>
>>> To: pittsj...@yahoo.com; 
>>> fedora-commons-developers@lists.sourceforge.net
>>> Sent: Wednesday, November 23, 2011 9:04 AM
>>> Subject: RE: [fcrepo-dev] Non Dublin Core data in DB
>>> 
>>> Hi
>>> 
>>> When you use RELS-EXT, relationships are stored inside the 
>> associated 
>>> triple store (by default Mulgara). With RISearch, you can 
>> make SPARQL 
>>> queries on it.
>>> 
>>> Alexis Miara
>>> LICEF
>>> Québec
>>> 
>>> -----Original Message-----
>>> From: JTP [mailto:pittsj...@yahoo.com]
>>> Sent: November-22-11 9:30 PM
>>> To: fedora-commons-developers@lists.sourceforge.net
>>> Subject: Re: [fcrepo-dev] Non Dublin Core data in DB
>>> 
>>> I am storing rdf in RELS-EXT, 
>>> xmlns:myns="http://www.nsdl.org/ontologies/relationships#";>, 
>>> namespace, text values  (no images,document ..etc). Since I 
>> do not see 
>>> these values in the database, beside the Dublic Core 
>> datastream, I was 
>>> curious to where the RELS-EXT datastream is stored.
>>> 
>>> 
>>> 
>>> 
>> **********************************************************************
>>> "Inveniam viam aut faciam -- "I will find a way or make one."
>>> 
>> **********************************************************************
>>> 
>>> -----Original Message-----
>>> From: aj...@virginia.edu [mailto:aj...@virginia.edu]
>>> Sent: Tuesday, November 22, 2011 5:19 PM
>>> To: fedora-commons-developers@lists.sourceforge.net
>>> Subject: Re: [fcrepo-dev] Non Dublin Core data in DB
>>> 
>>> In particular, if you'd like to use full-text indexing with your 
>>> metadata, you'll want to check out GSearch, a JMS-driven indexing 
>>> service for Fedora.
>>> 
>>> If you're storing RDF somewhere other than RELS-EXT or RELS-INT, 
>>> perhaps there's a way to map it into those datastreams, which will 
>>> allow you to use Fedora's built-in indexing, as described 
>> by Mr. Della 
>>> Bitta. Perhaps you can tell us a little more about what 
>> you're doing?
>>> 
>>> ---
>>> A. Soroka
>>> Online Library Environment
>>> the University of Virginia Library
>>> 
>>> 
>>> 
>>> 
>>> On Nov 22, 2011, at 4:04 PM, Michael Della Bitta wrote:
>>> 
>>>> If your RDF is in one of the two built-in RDF 
>> datastreams, RELS-EXT 
>>>> and RELS-INT, it's not indexed by default, but can be if 
>> you turn on 
>>>> the Resource Index. If you're storing RDF elsewhere in another 
>>>> datastream, it would take some hacking to get it indexed.
>>>> 
>>>> Michael Della Bitta
>>>> 
>>>> Senior Applications Developer
>>>> Information Technology Group
>>>> The New York Public Library
>>>> 40 West 20th Street, 5th Floor
>>>> New York, NY 10011-4211
>>>> (212) 621-0609
>>>> 
>>>> 
>>>> 
>>>> On Tue, Nov 22, 2011 at 3:57 PM, J.T.P. 
>> <pittsj...@yahoo.com> wrote:
>>>>> Other meta-data that is custom to my app (rdf data) .  Where are 
>>>>> these values stored ? Thanx....
>>>>> 
>>>>> 
>>> 
>> **********************************************************************
>>> ******
>>> *********
>>>>> "Inveniam viam aut faciam -- "I will find a way or make one."
>>>>> 
>>> 
>> **********************************************************************
>>> ******
>>> *******
>>>>> ________________________________
>>>>> From: "aj...@virginia.edu" <aj...@virginia.edu>
>>>>> To: "fedora-commons-developers@lists.sourceforge.net Developers" 
>>>>> <fedora-commons-developers@lists.sourceforge.net>
>>>>> Sent: Tuesday, November 22, 2011 3:21 PM
>>>>> Subject: Re: [fcrepo-dev] Non Dublin Core data in DB
>>>>> 
>>>>> Data in datastreams other than DC aren't normally persisted into 
>>>>> the SQL store. Are you thinking of object properties 
>> like "owner" 
>>>>> or "set", or
>>> some
>>>>> other kind of metadata?
>>>>> 
>>>>> ---
>>>>> A. Soroka
>>>>> Online Library Environment
>>>>> the University of Virginia Library
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> On Nov 22, 2011, at 3:17 PM, J.T.P. wrote:
>>>>> 
>>>>>> Hello FC'ers. Have a probably silly question. I 
>> recently migrated 
>>>>>> from Derby to Sybase. Applications works fine but a 
>> little slow on 
>>>>>> some queries.  I can only
>>> see
>>>>>> the Dublin Core data in the doFields table. Where does 
>> the data in
>>> non-DC
>>>>>> namespaces reside ? I want to put indexes on some 
>> fields to see if 
>>>>>> I can improve the performance. Any info would be most 
>> conducive. 
>>>>>> Respectfully, J. Pitts
>>>>>> 
>>>>>> 
>>> 
>> **********************************************************************
>>> ******
>>> *********
>>>>>> "Inveniam viam aut faciam -- "I will find a way or make one."
>>>>>> 
>>>>>> 
>>> 
>> **********************************************************************
>>> ******
>>> *******
>>>>>> 
>>>>>> 
>>> 
>> ----------------------------------------------------------------------
>>> ------
>>> --
>>>>>> All the data continuously generated in your IT infrastructure 
>>>>>> contains a definitive record of customers, application 
>>>>>> performance, security threats, fraudulent activity, and more. 
>>>>>> Splunk takes this data and makes sense of it. IT sense. 
>> And common 
>>>>>> sense.
>>>>>> 
>>>>>> 
>>> 
>> http://p.sf.net/sfu/splunk-novd2d_____________________________________
>>> ______
>>> ____
>>>>>> Fedora-commons-developers mailing list 
>>>>>> Fedora-commons-developers@lists.sourceforge.net
>>>>>> 
>> https://lists.sourceforge.net/lists/listinfo/fedora-commons-develo
>>>>>> pers
>>>>> 
>>>>> 
>>>>> 
>>> 
>> ----------------------------------------------------------------------
>>> ------
>>> --
>>>>> All the data continuously generated in your IT infrastructure 
>>>>> contains a definitive record of customers, application 
>> performance, 
>>>>> security threats, fraudulent activity, and more. Splunk 
>> takes this 
>>>>> data and makes sense of it. IT sense. And common sense. 
>>>>> http://p.sf.net/sfu/splunk-novd2d 
>>>>> _______________________________________________
>>>>> Fedora-commons-developers mailing list 
>>>>> Fedora-commons-developers@lists.sourceforge.net
>>>>> 
>> https://lists.sourceforge.net/lists/listinfo/fedora-commons-develop
>>>>> ers
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>> 
>> ----------------------------------------------------------------------
>>> ------
>>> --
>>>>> All the data continuously generated in your IT infrastructure 
>>>>> contains a definitive record of customers, application 
>> performance, 
>>>>> security threats, fraudulent activity, and more. Splunk 
>> takes this 
>>>>> data and makes sense of it. IT sense. And common sense. 
>>>>> http://p.sf.net/sfu/splunk-novd2d 
>>>>> _______________________________________________
>>>>> Fedora-commons-developers mailing list 
>>>>> Fedora-commons-developers@lists.sourceforge.net
>>>>> 
>> https://lists.sourceforge.net/lists/listinfo/fedora-commons-develop
>>>>> ers
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>> ----------------------------------------------------------------------
>>> ------
>>> --
>>>> All the data continuously generated in your IT infrastructure
>>>> contains a definitive record of customers, application 
>> performance, 
>>>> security threats, fraudulent activity, and more. Splunk 
>> takes this 
>>>> data and makes sense of it. IT sense. And common sense.
>>>> http://p.sf.net/sfu/splunk-novd2d
>>>> _______________________________________________
>>>> Fedora-commons-developers mailing list
>>>> Fedora-commons-developers@lists.sourceforge.net
>>>> 
>> https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers
>>> 
>>> 
>>> 
>> ----------------------------------------------------------------------
>>> ------
>>> --
>>> All the data continuously generated in your IT infrastructure 
>>> contains a definitive record of customers, application performance, 
>>> security threats, fraudulent activity, and more. Splunk takes this 
>>> data and makes sense of it. IT sense. And common sense.
>>> http://p.sf.net/sfu/splunk-novd2d
>>> _______________________________________________
>>> Fedora-commons-developers mailing list
>>> Fedora-commons-developers@lists.sourceforge.net
>>> 
>> https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers
>>> 
>>> 
>>> 
>> ----------------------------------------------------------------------
>>> --------
>>> All the data continuously generated in your IT infrastructure 
>>> contains a definitive record of customers, application performance, 
>>> security threats, fraudulent activity, and more. Splunk takes this 
>>> data and makes sense of it. IT sense. And common sense.
>>> http://p.sf.net/sfu/splunk-novd2d
>>> _______________________________________________
>>> Fedora-commons-developers mailing list
>>> Fedora-commons-developers@lists.sourceforge.net
>>> 
>> https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers
>>> 
>>> 
>>> 
>> ----------------------------------------------------------------------
>>> --------
>>> All the data continuously generated in your IT infrastructure 
>>> contains a definitive record of customers, application performance, 
>>> security threats, fraudulent activity, and more. Splunk takes this 
>>> data and makes sense of it. IT sense. And common sense.
>>> 
>> http://p.sf.net/sfu/splunk-novd2d_____________________________
> __________________
>> Fedora-commons-developers mailing list
>> Fedora-commons-developers@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers
> 
> 
> ----------------------------------------------------------------------------
> --
> All the data continuously generated in your IT infrastructure 
> contains a definitive record of customers, application performance, 
> security threats, fraudulent activity, and more. Splunk takes this 
> data and makes sense of it. IT sense. And common sense.
> http://p.sf.net/sfu/splunk-novd2d
> _______________________________________________
> Fedora-commons-developers mailing list
> Fedora-commons-developers@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers
> 
> 
> ------------------------------------------------------------------------------
> All the data continuously generated in your IT infrastructure 
> contains a definitive record of customers, application performance, 
> security threats, fraudulent activity, and more. Splunk takes this 
> data and makes sense of it. IT sense. And common sense.
> http://p.sf.net/sfu/splunk-novd2d
> _______________________________________________
> Fedora-commons-developers mailing list
> Fedora-commons-developers@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers


------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
Fedora-commons-developers mailing list
Fedora-commons-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers

Reply via email to