Re: [fcrepo-user] Collection Structure Design Issues

Gert Schmeltz Pedersen Tue, 10 May 2011 01:24:49 -0700

Hi Agustina,

I can supply a point related to Solr and the generic search.


When you index your Fedora objects, you create one index document per Fedora 
object, each identified by the unique PID. When you query that index, you get 
one hit per index document that qualifies.

The implication is, that your second approach gets more complicated, because 
each of your "logical" objects is represented by many partial objects (which 
you have to link by relationships), each with their own unique Fedora PID, so 
your Solr query results will include PIDs from the partial objects, which may 
or may not provide you with enough information for the case. 

You may, or should, also have Fedora objects representing the "logical" 
objects, which you may index for the purposes, where you want query results to 
contain the PIDs of the "logical" objects. It all depends on what you write in 
your indexing stylesheet in GSearch.

Let me hear, if you want more details.

Best,
Gert

On 06/05/2011, at 16.57, Martinez Garcia, Agustina wrote:

> Hi all,
> 
> I am currently developing some Collections Model which uses Fedora and I am 
> facing some design issues. I would like to ask for advice regarding the 
> advantages from the point of view of impacts in searching and indexing of two 
> different designs I have in mind. Either approaches I will describe are not 
> complex in terms of collection structure and object types. On the other hand, 
> I am using both Mulgara RI and Solr engine combined with Fedora generic 
> search for automatic index updates.
> 
> The first approach includes the following layout:
> 
> - Collection level object, which is the container of the collection and only 
> holds metadata datastreams and one content datastream
> - Document objects, which are related with the collection container and could 
> have potentially a large number of internal datastreams (inline xml 
> datastream types) and make use of RELS-EXT to link them with the collection 
> container object and RELS-INT datastream to specify the different 
> relationships within the contained inline datastreams.
> 
> This approach, from the point of view of indexing the datastreams in the 
> search engine, presents a very simple structure since everything is included 
> within one object but on the other hand, presents performance issues since 
> the number of inline datastreams could potentially be very high (each one is 
> very small but the number of them could be very high).
> 
> The second approach involves having multiple types of objects:
> - Collection level object (the same as in the first approach)
> - Document object, which now only contains metadata, RELS-EXT and one managed 
> datastream
> - Data objects, which only contain metadata datastream, inline xml datastream 
> and RELS-EXT to relate them with the document objects and between them. These 
> objects are the equivalent of the inline-xml datastreams that were included 
> within the document objects in the first approach.
> 
> This solution does not present complexity or impact in searchs when using the 
> RI and Mulgara triplestore as a search interface but seems to be more complex 
> if using Solr Search engine.
> 
> Has anyone dealt with similar design issues? I would really appreciate any 
> advice on this.
> 
> Thanks in advance,
> Agustina
> 
> ------------------------------------------------------------------------------
> WhatsUp Gold - Download Free Network Management Software
> The most intuitive, comprehensive, and cost-effective network 
> management toolset available today.  Delivers lowest initial 
> acquisition cost and overall TCO of any competing solution.
> http://p.sf.net/sfu/whatsupgold-sd
> _______________________________________________
> Fedora-commons-users mailing list
> Fedora-commons-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users


------------------------------------------------------------------------------
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Fedora-commons-users mailing list
Fedora-commons-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users

Re: [fcrepo-user] Collection Structure Design Issues

Reply via email to