To follow on the full-text vs. structural index question: It seems to me, from what you've said, that you have a relatively flat kind of metadata here. The fact that the names of the fields involved are RDF predicates doesn't necessarily mean that RDF indexing (such as is supplied by the Resource Index) is actually the best tool for the job. In my experience, RDF indexing is the tool you want to reach for when the metadata in question and the queries you expect to do across it are truly structured. From your example, that doesn't appear to be the case. If it's not the case (if your metadata is basically a flat set of simple-valued fields) a good full-text index and queries written to it are going to beat the pants off of most RDF indexes with respect to speed.
Do you have examples of structured queries you expect to perform across this metadata? --- A. Soroka Online Library Environment the University of Virginia Library On Nov 23, 2011, at 1:56 PM, Stephen Bayliss wrote: > A full text index would help I think also. > > Worth noting that FILTER will (as far as I know) take place *after* the > results have been retrieved. > > Steve > >> -----Original Message----- >> From: aj...@virginia.edu [mailto:aj...@virginia.edu] >> Sent: 23 November 2011 16:52 >> To: fedora-commons-developers@lists.sourceforge.net Developers >> Subject: Re: [fcrepo-dev] Non Dublin Core data in DB >> >> >> Are you using the default Mulgara triplestore configuration? >> >> If the multiple objects in your SPARQL query are, as I >> believe you wrote, not actually resources but instead simple >> strings, have you considered using a full-text index for this >> kind of search? It would seem to be a good fit for Lucene's >> faceting abilities or a similar functionality. >> >> --- >> A. Soroka >> Online Library Environment >> the University of Virginia Library >> >> >> >> >> On Nov 23, 2011, at 11:47 AM, J.T.P. wrote: >> >>> Reason for my investigation is for performance issues. I am using >>> SPARQL retrieving 20 objects (string values, 20 triples in my where >>> clause ) with about 1000 fedora objects in the datastore. It take >>> about 18 seconds for retrieval. My sparql query is in the >> format of >>> >>> select * where{ >>> ?subject <namespace:object> ?object >>> ?subject <namespace:object_1> ?object_1 >>> . >>> . >>> . >>> ?subject <namespace:object_20> ?object_20 FILTER(REGEX(?object, >>> "stringValue","i") } >>> Any info would be most conducive. >>> >>> Very Respectfully, >>> J.Pitts >>> >>> >> ********************************************************************** >>> *************** >>> "Inveniam viam aut faciam -- “I will find a way or make one.” >>> >> ************************************************************** >> ********************* >>> >>> From: Alexis Miara <alexis.mi...@licef.ca> >>> To: pittsj...@yahoo.com; >>> fedora-commons-developers@lists.sourceforge.net >>> Sent: Wednesday, November 23, 2011 9:04 AM >>> Subject: RE: [fcrepo-dev] Non Dublin Core data in DB >>> >>> Hi >>> >>> When you use RELS-EXT, relationships are stored inside the >> associated >>> triple store (by default Mulgara). With RISearch, you can >> make SPARQL >>> queries on it. >>> >>> Alexis Miara >>> LICEF >>> Québec >>> >>> -----Original Message----- >>> From: JTP [mailto:pittsj...@yahoo.com] >>> Sent: November-22-11 9:30 PM >>> To: fedora-commons-developers@lists.sourceforge.net >>> Subject: Re: [fcrepo-dev] Non Dublin Core data in DB >>> >>> I am storing rdf in RELS-EXT, >>> xmlns:myns="http://www.nsdl.org/ontologies/relationships#">, >>> namespace, text values (no images,document ..etc). Since I >> do not see >>> these values in the database, beside the Dublic Core >> datastream, I was >>> curious to where the RELS-EXT datastream is stored. >>> >>> >>> >>> >> ********************************************************************** >>> "Inveniam viam aut faciam -- "I will find a way or make one." >>> >> ********************************************************************** >>> >>> -----Original Message----- >>> From: aj...@virginia.edu [mailto:aj...@virginia.edu] >>> Sent: Tuesday, November 22, 2011 5:19 PM >>> To: fedora-commons-developers@lists.sourceforge.net >>> Subject: Re: [fcrepo-dev] Non Dublin Core data in DB >>> >>> In particular, if you'd like to use full-text indexing with your >>> metadata, you'll want to check out GSearch, a JMS-driven indexing >>> service for Fedora. >>> >>> If you're storing RDF somewhere other than RELS-EXT or RELS-INT, >>> perhaps there's a way to map it into those datastreams, which will >>> allow you to use Fedora's built-in indexing, as described >> by Mr. Della >>> Bitta. Perhaps you can tell us a little more about what >> you're doing? >>> >>> --- >>> A. Soroka >>> Online Library Environment >>> the University of Virginia Library >>> >>> >>> >>> >>> On Nov 22, 2011, at 4:04 PM, Michael Della Bitta wrote: >>> >>>> If your RDF is in one of the two built-in RDF >> datastreams, RELS-EXT >>>> and RELS-INT, it's not indexed by default, but can be if >> you turn on >>>> the Resource Index. If you're storing RDF elsewhere in another >>>> datastream, it would take some hacking to get it indexed. >>>> >>>> Michael Della Bitta >>>> >>>> Senior Applications Developer >>>> Information Technology Group >>>> The New York Public Library >>>> 40 West 20th Street, 5th Floor >>>> New York, NY 10011-4211 >>>> (212) 621-0609 >>>> >>>> >>>> >>>> On Tue, Nov 22, 2011 at 3:57 PM, J.T.P. >> <pittsj...@yahoo.com> wrote: >>>>> Other meta-data that is custom to my app (rdf data) . Where are >>>>> these values stored ? Thanx.... >>>>> >>>>> >>> >> ********************************************************************** >>> ****** >>> ********* >>>>> "Inveniam viam aut faciam -- "I will find a way or make one." >>>>> >>> >> ********************************************************************** >>> ****** >>> ******* >>>>> ________________________________ >>>>> From: "aj...@virginia.edu" <aj...@virginia.edu> >>>>> To: "fedora-commons-developers@lists.sourceforge.net Developers" >>>>> <fedora-commons-developers@lists.sourceforge.net> >>>>> Sent: Tuesday, November 22, 2011 3:21 PM >>>>> Subject: Re: [fcrepo-dev] Non Dublin Core data in DB >>>>> >>>>> Data in datastreams other than DC aren't normally persisted into >>>>> the SQL store. Are you thinking of object properties >> like "owner" >>>>> or "set", or >>> some >>>>> other kind of metadata? >>>>> >>>>> --- >>>>> A. Soroka >>>>> Online Library Environment >>>>> the University of Virginia Library >>>>> >>>>> >>>>> >>>>> >>>>> On Nov 22, 2011, at 3:17 PM, J.T.P. wrote: >>>>> >>>>>> Hello FC'ers. Have a probably silly question. I >> recently migrated >>>>>> from Derby to Sybase. Applications works fine but a >> little slow on >>>>>> some queries. I can only >>> see >>>>>> the Dublin Core data in the doFields table. Where does >> the data in >>> non-DC >>>>>> namespaces reside ? I want to put indexes on some >> fields to see if >>>>>> I can improve the performance. Any info would be most >> conducive. >>>>>> Respectfully, J. Pitts >>>>>> >>>>>> >>> >> ********************************************************************** >>> ****** >>> ********* >>>>>> "Inveniam viam aut faciam -- "I will find a way or make one." >>>>>> >>>>>> >>> >> ********************************************************************** >>> ****** >>> ******* >>>>>> >>>>>> >>> >> ---------------------------------------------------------------------- >>> ------ >>> -- >>>>>> All the data continuously generated in your IT infrastructure >>>>>> contains a definitive record of customers, application >>>>>> performance, security threats, fraudulent activity, and more. >>>>>> Splunk takes this data and makes sense of it. IT sense. >> And common >>>>>> sense. >>>>>> >>>>>> >>> >> http://p.sf.net/sfu/splunk-novd2d_____________________________________ >>> ______ >>> ____ >>>>>> Fedora-commons-developers mailing list >>>>>> Fedora-commons-developers@lists.sourceforge.net >>>>>> >> https://lists.sourceforge.net/lists/listinfo/fedora-commons-develo >>>>>> pers >>>>> >>>>> >>>>> >>> >> ---------------------------------------------------------------------- >>> ------ >>> -- >>>>> All the data continuously generated in your IT infrastructure >>>>> contains a definitive record of customers, application >> performance, >>>>> security threats, fraudulent activity, and more. Splunk >> takes this >>>>> data and makes sense of it. IT sense. And common sense. >>>>> http://p.sf.net/sfu/splunk-novd2d >>>>> _______________________________________________ >>>>> Fedora-commons-developers mailing list >>>>> Fedora-commons-developers@lists.sourceforge.net >>>>> >> https://lists.sourceforge.net/lists/listinfo/fedora-commons-develop >>>>> ers >>>>> >>>>> >>>>> >>>>> >>> >> ---------------------------------------------------------------------- >>> ------ >>> -- >>>>> All the data continuously generated in your IT infrastructure >>>>> contains a definitive record of customers, application >> performance, >>>>> security threats, fraudulent activity, and more. Splunk >> takes this >>>>> data and makes sense of it. IT sense. And common sense. >>>>> http://p.sf.net/sfu/splunk-novd2d >>>>> _______________________________________________ >>>>> Fedora-commons-developers mailing list >>>>> Fedora-commons-developers@lists.sourceforge.net >>>>> >> https://lists.sourceforge.net/lists/listinfo/fedora-commons-develop >>>>> ers >>>>> >>>>> >>>> >>>> >>> >> ---------------------------------------------------------------------- >>> ------ >>> -- >>>> All the data continuously generated in your IT infrastructure >>>> contains a definitive record of customers, application >> performance, >>>> security threats, fraudulent activity, and more. Splunk >> takes this >>>> data and makes sense of it. IT sense. And common sense. >>>> http://p.sf.net/sfu/splunk-novd2d >>>> _______________________________________________ >>>> Fedora-commons-developers mailing list >>>> Fedora-commons-developers@lists.sourceforge.net >>>> >> https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers >>> >>> >>> >> ---------------------------------------------------------------------- >>> ------ >>> -- >>> All the data continuously generated in your IT infrastructure >>> contains a definitive record of customers, application performance, >>> security threats, fraudulent activity, and more. Splunk takes this >>> data and makes sense of it. IT sense. And common sense. >>> http://p.sf.net/sfu/splunk-novd2d >>> _______________________________________________ >>> Fedora-commons-developers mailing list >>> Fedora-commons-developers@lists.sourceforge.net >>> >> https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers >>> >>> >>> >> ---------------------------------------------------------------------- >>> -------- >>> All the data continuously generated in your IT infrastructure >>> contains a definitive record of customers, application performance, >>> security threats, fraudulent activity, and more. Splunk takes this >>> data and makes sense of it. IT sense. And common sense. >>> http://p.sf.net/sfu/splunk-novd2d >>> _______________________________________________ >>> Fedora-commons-developers mailing list >>> Fedora-commons-developers@lists.sourceforge.net >>> >> https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers >>> >>> >>> >> ---------------------------------------------------------------------- >>> -------- >>> All the data continuously generated in your IT infrastructure >>> contains a definitive record of customers, application performance, >>> security threats, fraudulent activity, and more. Splunk takes this >>> data and makes sense of it. IT sense. And common sense. >>> >> http://p.sf.net/sfu/splunk-novd2d_____________________________ > __________________ >> Fedora-commons-developers mailing list >> Fedora-commons-developers@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers > > > ---------------------------------------------------------------------------- > -- > All the data continuously generated in your IT infrastructure > contains a definitive record of customers, application performance, > security threats, fraudulent activity, and more. Splunk takes this > data and makes sense of it. IT sense. And common sense. > http://p.sf.net/sfu/splunk-novd2d > _______________________________________________ > Fedora-commons-developers mailing list > Fedora-commons-developers@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers > > > ------------------------------------------------------------------------------ > All the data continuously generated in your IT infrastructure > contains a definitive record of customers, application performance, > security threats, fraudulent activity, and more. Splunk takes this > data and makes sense of it. IT sense. And common sense. > http://p.sf.net/sfu/splunk-novd2d > _______________________________________________ > Fedora-commons-developers mailing list > Fedora-commons-developers@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers ------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-novd2d _______________________________________________ Fedora-commons-developers mailing list Fedora-commons-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers