I don't think these questions or suggestions are naive at all. In fact, some of them relate to Jira issues that have been around for some time, which lends the questions force:
https://jira.duraspace.org/browse/FCREPO-656 https://jira.duraspace.org/browse/FCREPO-653 I believe that part of the reason that RELS-* are "privileged" datastreams is because of the historic centrality of the Fedora object model to the architecture. I'm not sure that RELS-* were ever meant to be the only places to put RDF-- more that they mark the importance of particular kinds of RDF, the kind that generates the resource graph in the repository-- the "structural" kind. There are a lot of obstacles to the kind of more expansive vision of RDF-usage you describe (or the vision of those Jira issues) at the level of workflow, especially relating to updating and purging. That doesn't mean it can't be done, just that the tight coupling between the RI and the core datastores introduces difficulties. In fact, some of us have questioned whether the Resource Index really should be a core service in the framework, or, like full-text indexing, it ought to be supported in some less tightly-coupled way. This question is not unrelated. To the specifics of the recent exchange: "This is an interesting viewpoint that structural data only be stored in RDF while other metadata may have more appropriate separate formats." That is not my view. My view is that RDF _indexing_ is best used over "structural" metadata whereas flat metadata will be best served by full-text _indexing_. In fact, I did not suggest that the original correspondent use some other format-- I was making a point about the inefficiency of using RDF indexing (which is typically designed to support queries over a graph, e.g. SPARQL) to do what amounts to speedy string-matching. The question was about the slowness of results from a query. Ideally, the indexing engines of triplestores like Mulgara would re-use engines like Lucene so efficiently that the difference in speed would disappear, but that hasn't yet happened. --- A. Soroka Online Library Environment the University of Virginia Library On Nov 23, 2011, at 3:00 PM, Mark Diggory wrote: > Apologies if this seems naive. I am not fully versed in the RI current > capabilities... this said... > > This is an interesting viewpoint that structural data only be stored in RDF > while other metadata may have more appropriate separate formats. My > challenge is that if you look at RDF, its meant to be more expressive than > just structure and it is used as such in the LoD / Semantic web world. So if > the tool using fedora happens to want to store RDF, is it really not going to > always be in the position of placing this structural metadata into "RELS-EXT > or RELS-INT datastreams from a managerial standpoint. > > Its not always going to be tractable for the application to be separating out > what stuff goes into RELS-EXT, REL-INT, or otherwise. TBH, attempting to do > so really increases the amount of mapping rules/behavior/logic that needs to > be maintained in the application. Meaning, increased complexity for those > developing applications. Thus this classification of "RELS-EXT/INT/..." > seems like a persistence detail that should somehow avoided being exposed in > business/logical interface of Fedora. > > Seems like it would have been much more beneficial to support > mapping/decision making about what RDF goes into the RI and how it is > partitioned into separate graphs/models at the point of indexing. Why not > apply some rules to the RI indexing to allow partitioning of the store into > separate models/graphs based on system needs, "System", "Structural", > "Descriptive", "...". This would do away with the need to store structure so > artificially in in a specific system defined datastream and allow it to be > distributed across any datastream, it would also allow optimized graphs to be > constructed and used for specific system activities what allowing other > graphs to be expressed for larger or more subject specific customized > requirements. > > Being an RDF datastream (or disseminatable as RDF) should have been the first > rule for datastreams getting indexed, then various filtering rules that could > be customized would allow for tuning the indexing accordingly. Perhaps this > is a detail that could go into the model for the FO that could be > customizable by the enduser application. > > Again, apologies if this is naive, references to such a configuration > capability are welcome if it already exists. > > Mark > > On Wed, Nov 23, 2011 at 11:02 AM, <aj...@virginia.edu> wrote: > To follow on the full-text vs. structural index question: > > It seems to me, from what you've said, that you have a relatively flat kind > of metadata here. The fact that the names of the fields involved are RDF > predicates doesn't necessarily mean that RDF indexing (such as is supplied by > the Resource Index) is actually the best tool for the job. In my experience, > RDF indexing is the tool you want to reach for when the metadata in question > and the queries you expect to do across it are truly structured. From your > example, that doesn't appear to be the case. If it's not the case (if your > metadata is basically a flat set of simple-valued fields) a good full-text > index and queries written to it are going to beat the pants off of most RDF > indexes with respect to speed. > > Do you have examples of structured queries you expect to perform across this > metadata? > > --- > A. Soroka > Online Library Environment > the University of Virginia Library > > > > > On Nov 23, 2011, at 1:56 PM, Stephen Bayliss wrote: > > > A full text index would help I think also. > > > > Worth noting that FILTER will (as far as I know) take place *after* the > > results have been retrieved. > > > > Steve > > > >> -----Original Message----- > >> From: aj...@virginia.edu [mailto:aj...@virginia.edu] > >> Sent: 23 November 2011 16:52 > >> To: fedora-commons-developers@lists.sourceforge.net Developers > >> Subject: Re: [fcrepo-dev] Non Dublin Core data in DB > >> > >> > >> Are you using the default Mulgara triplestore configuration? > >> > >> If the multiple objects in your SPARQL query are, as I > >> believe you wrote, not actually resources but instead simple > >> strings, have you considered using a full-text index for this > >> kind of search? It would seem to be a good fit for Lucene's > >> faceting abilities or a similar functionality. > >> > >> --- > >> A. Soroka > >> Online Library Environment > >> the University of Virginia Library > >> > >> > >> > >> > >> On Nov 23, 2011, at 11:47 AM, J.T.P. wrote: > >> > >>> Reason for my investigation is for performance issues. I am using > >>> SPARQL retrieving 20 objects (string values, 20 triples in my where > >>> clause ) with about 1000 fedora objects in the datastore. It take > >>> about 18 seconds for retrieval. My sparql query is in the > >> format of > >>> > >>> select * where{ > >>> ?subject <namespace:object> ?object > >>> ?subject <namespace:object_1> ?object_1 > >>> . > >>> . > >>> . > >>> ?subject <namespace:object_20> ?object_20 FILTER(REGEX(?object, > >>> "stringValue","i") } > >>> Any info would be most conducive. > >>> > >>> Very Respectfully, > >>> J.Pitts > >>> > >>> > >> ********************************************************************** > >>> *************** > >>> "Inveniam viam aut faciam -- “I will find a way or make one.” > >>> > >> ************************************************************** > >> ********************* > >>> > >>> From: Alexis Miara <alexis.mi...@licef.ca> > >>> To: pittsj...@yahoo.com; > >>> fedora-commons-developers@lists.sourceforge.net > >>> Sent: Wednesday, November 23, 2011 9:04 AM > >>> Subject: RE: [fcrepo-dev] Non Dublin Core data in DB > >>> > >>> Hi > >>> > >>> When you use RELS-EXT, relationships are stored inside the > >> associated > >>> triple store (by default Mulgara). With RISearch, you can > >> make SPARQL > >>> queries on it. > >>> > >>> Alexis Miara > >>> LICEF > >>> Québec > >>> > >>> -----Original Message----- > >>> From: JTP [mailto:pittsj...@yahoo.com] > >>> Sent: November-22-11 9:30 PM > >>> To: fedora-commons-developers@lists.sourceforge.net > >>> Subject: Re: [fcrepo-dev] Non Dublin Core data in DB > >>> > >>> I am storing rdf in RELS-EXT, > >>> xmlns:myns="http://www.nsdl.org/ontologies/relationships#">, > >>> namespace, text values (no images,document ..etc). Since I > >> do not see > >>> these values in the database, beside the Dublic Core > >> datastream, I was > >>> curious to where the RELS-EXT datastream is stored. > >>> > >>> > >>> > >>> > >> ********************************************************************** > >>> "Inveniam viam aut faciam -- "I will find a way or make one." > >>> > >> ********************************************************************** > >>> > >>> -----Original Message----- > >>> From: aj...@virginia.edu [mailto:aj...@virginia.edu] > >>> Sent: Tuesday, November 22, 2011 5:19 PM > >>> To: fedora-commons-developers@lists.sourceforge.net > >>> Subject: Re: [fcrepo-dev] Non Dublin Core data in DB > >>> > >>> In particular, if you'd like to use full-text indexing with your > >>> metadata, you'll want to check out GSearch, a JMS-driven indexing > >>> service for Fedora. > >>> > >>> If you're storing RDF somewhere other than RELS-EXT or RELS-INT, > >>> perhaps there's a way to map it into those datastreams, which will > >>> allow you to use Fedora's built-in indexing, as described > >> by Mr. Della > >>> Bitta. Perhaps you can tell us a little more about what > >> you're doing? > >>> > >>> --- > >>> A. Soroka > >>> Online Library Environment > >>> the University of Virginia Library > >>> > >>> > >>> > >>> > >>> On Nov 22, 2011, at 4:04 PM, Michael Della Bitta wrote: > >>> > >>>> If your RDF is in one of the two built-in RDF > >> datastreams, RELS-EXT > >>>> and RELS-INT, it's not indexed by default, but can be if > >> you turn on > >>>> the Resource Index. If you're storing RDF elsewhere in another > >>>> datastream, it would take some hacking to get it indexed. > >>>> > >>>> Michael Della Bitta > >>>> > >>>> Senior Applications Developer > >>>> Information Technology Group > >>>> The New York Public Library > >>>> 40 West 20th Street, 5th Floor > >>>> New York, NY 10011-4211 > >>>> (212) 621-0609 > >>>> > >>>> > >>>> > >>>> On Tue, Nov 22, 2011 at 3:57 PM, J.T.P. > >> <pittsj...@yahoo.com> wrote: > >>>>> Other meta-data that is custom to my app (rdf data) . Where are > >>>>> these values stored ? Thanx.... > >>>>> > >>>>> > >>> > >> ********************************************************************** > >>> ****** > >>> ********* > >>>>> "Inveniam viam aut faciam -- "I will find a way or make one." > >>>>> > >>> > >> ********************************************************************** > >>> ****** > >>> ******* > >>>>> ________________________________ > >>>>> From: "aj...@virginia.edu" <aj...@virginia.edu> > >>>>> To: "fedora-commons-developers@lists.sourceforge.net Developers" > >>>>> <fedora-commons-developers@lists.sourceforge.net> > >>>>> Sent: Tuesday, November 22, 2011 3:21 PM > >>>>> Subject: Re: [fcrepo-dev] Non Dublin Core data in DB > >>>>> > >>>>> Data in datastreams other than DC aren't normally persisted into > >>>>> the SQL store. Are you thinking of object properties > >> like "owner" > >>>>> or "set", or > >>> some > >>>>> other kind of metadata? > >>>>> > >>>>> --- > >>>>> A. Soroka > >>>>> Online Library Environment > >>>>> the University of Virginia Library > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> On Nov 22, 2011, at 3:17 PM, J.T.P. wrote: > >>>>> > >>>>>> Hello FC'ers. Have a probably silly question. I > >> recently migrated > >>>>>> from Derby to Sybase. Applications works fine but a > >> little slow on > >>>>>> some queries. I can only > >>> see > >>>>>> the Dublin Core data in the doFields table. Where does > >> the data in > >>> non-DC > >>>>>> namespaces reside ? I want to put indexes on some > >> fields to see if > >>>>>> I can improve the performance. Any info would be most > >> conducive. > >>>>>> Respectfully, J. Pitts > >>>>>> > >>>>>> > >>> > >> ********************************************************************** > >>> ****** > >>> ********* > >>>>>> "Inveniam viam aut faciam -- "I will find a way or make one." > >>>>>> > >>>>>> > >>> > >> ********************************************************************** > >>> ****** > >>> ******* > >>>>>> > >>>>>> > >>> > >> ---------------------------------------------------------------------- > >>> ------ > >>> -- > >>>>>> All the data continuously generated in your IT infrastructure > >>>>>> contains a definitive record of customers, application > >>>>>> performance, security threats, fraudulent activity, and more. > >>>>>> Splunk takes this data and makes sense of it. IT sense. > >> And common > >>>>>> sense. > >>>>>> > >>>>>> > >>> > >> http://p.sf.net/sfu/splunk-novd2d_____________________________________ > >>> ______ > >>> ____ > >>>>>> Fedora-commons-developers mailing list > >>>>>> Fedora-commons-developers@lists.sourceforge.net > >>>>>> > >> https://lists.sourceforge.net/lists/listinfo/fedora-commons-develo > >>>>>> pers > >>>>> > >>>>> > >>>>> > >>> > >> ---------------------------------------------------------------------- > >>> ------ > >>> -- > >>>>> All the data continuously generated in your IT infrastructure > >>>>> contains a definitive record of customers, application > >> performance, > >>>>> security threats, fraudulent activity, and more. Splunk > >> takes this > >>>>> data and makes sense of it. IT sense. And common sense. > >>>>> http://p.sf.net/sfu/splunk-novd2d > >>>>> _______________________________________________ > >>>>> Fedora-commons-developers mailing list > >>>>> Fedora-commons-developers@lists.sourceforge.net > >>>>> > >> https://lists.sourceforge.net/lists/listinfo/fedora-commons-develop > >>>>> ers > >>>>> > >>>>> > >>>>> > >>>>> > >>> > >> ---------------------------------------------------------------------- > >>> ------ > >>> -- > >>>>> All the data continuously generated in your IT infrastructure > >>>>> contains a definitive record of customers, application > >> performance, > >>>>> security threats, fraudulent activity, and more. Splunk > >> takes this > >>>>> data and makes sense of it. IT sense. And common sense. > >>>>> http://p.sf.net/sfu/splunk-novd2d > >>>>> _______________________________________________ > >>>>> Fedora-commons-developers mailing list > >>>>> Fedora-commons-developers@lists.sourceforge.net > >>>>> > >> https://lists.sourceforge.net/lists/listinfo/fedora-commons-develop > >>>>> ers > >>>>> > >>>>> > >>>> > >>>> > >>> > >> ---------------------------------------------------------------------- > >>> ------ > >>> -- > >>>> All the data continuously generated in your IT infrastructure > >>>> contains a definitive record of customers, application > >> performance, > >>>> security threats, fraudulent activity, and more. Splunk > >> takes this > >>>> data and makes sense of it. IT sense. And common sense. > >>>> http://p.sf.net/sfu/splunk-novd2d > >>>> _______________________________________________ > >>>> Fedora-commons-developers mailing list > >>>> Fedora-commons-developers@lists.sourceforge.net > >>>> > >> https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers > >>> > >>> > >>> > >> ---------------------------------------------------------------------- > >>> ------ > >>> -- > >>> All the data continuously generated in your IT infrastructure > >>> contains a definitive record of customers, application performance, > >>> security threats, fraudulent activity, and more. Splunk takes this > >>> data and makes sense of it. IT sense. And common sense. > >>> http://p.sf.net/sfu/splunk-novd2d > >>> _______________________________________________ > >>> Fedora-commons-developers mailing list > >>> Fedora-commons-developers@lists.sourceforge.net > >>> > >> https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers > >>> > >>> > >>> > >> ---------------------------------------------------------------------- > >>> -------- > >>> All the data continuously generated in your IT infrastructure > >>> contains a definitive record of customers, application performance, > >>> security threats, fraudulent activity, and more. Splunk takes this > >>> data and makes sense of it. IT sense. And common sense. > >>> http://p.sf.net/sfu/splunk-novd2d > >>> _______________________________________________ > >>> Fedora-commons-developers mailing list > >>> Fedora-commons-developers@lists.sourceforge.net > >>> > >> https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers > >>> > >>> > >>> > >> ---------------------------------------------------------------------- > >>> -------- > >>> All the data continuously generated in your IT infrastructure > >>> contains a definitive record of customers, application performance, > >>> security threats, fraudulent activity, and more. Splunk takes this > >>> data and makes sense of it. IT sense. And common sense. > >>> > >> http://p.sf.net/sfu/splunk-novd2d_____________________________ > > __________________ > >> Fedora-commons-developers mailing list > >> Fedora-commons-developers@lists.sourceforge.net > >> https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers > > > > > > ---------------------------------------------------------------------------- > > -- > > All the data continuously generated in your IT infrastructure > > contains a definitive record of customers, application performance, > > security threats, fraudulent activity, and more. Splunk takes this > > data and makes sense of it. IT sense. And common sense. > > http://p.sf.net/sfu/splunk-novd2d > > _______________________________________________ > > Fedora-commons-developers mailing list > > Fedora-commons-developers@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers > > > > > > ------------------------------------------------------------------------------ > > All the data continuously generated in your IT infrastructure > > contains a definitive record of customers, application performance, > > security threats, fraudulent activity, and more. Splunk takes this > > data and makes sense of it. IT sense. And common sense. > > http://p.sf.net/sfu/splunk-novd2d > > _______________________________________________ > > Fedora-commons-developers mailing list > > Fedora-commons-developers@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers > > > ------------------------------------------------------------------------------ > All the data continuously generated in your IT infrastructure > contains a definitive record of customers, application performance, > security threats, fraudulent activity, and more. Splunk takes this > data and makes sense of it. IT sense. And common sense. > http://p.sf.net/sfu/splunk-novd2d > _______________________________________________ > Fedora-commons-developers mailing list > Fedora-commons-developers@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers > > > > -- > > Mark Diggory > 2888 Loker Avenue East, Suite 305, Carlsbad, CA. 92010 > Esperantolaan 4, Heverlee 3001, Belgium > http://www.atmire.com > > > ------------------------------------------------------------------------------ > All the data continuously generated in your IT infrastructure > contains a definitive record of customers, application performance, > security threats, fraudulent activity, and more. Splunk takes this > data and makes sense of it. IT sense. And common sense. > http://p.sf.net/sfu/splunk-novd2d_______________________________________________ > Fedora-commons-developers mailing list > Fedora-commons-developers@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers ------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-novd2d _______________________________________________ Fedora-commons-developers mailing list Fedora-commons-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/fedora-commons-developers