I am using GSearch 2.4 and "usual" approach to extract set of MODS elements limited to those used in my repository. In my case Tika is only used to get datastreams’ fulltext and embedded metadata.
MODS is "hierarchical" and I was able to do custom XSLT tweaking to convert them to "flat" Solr document the way I needed. DC is easier in this sense for generic processing because it is "flat". Anyway it might be good idea to create parser for MODS, MIX. Serhiy Polyakov Texas Center for Digital Knowledge University of North Texas On Tue, Jan 17, 2012 at 6:47 PM, Conal Tuohy <conal.tu...@versi.edu.au> wrote: > Hi Matteo > > For MODS, MIX, and other XML-based metadata schemas, I'd suggest XSLT is > probably a more appropriate language than Java. > > Conal > > > On 18/01/12 01:12, Matteo Bertazzo wrote: >> Hi all, >> we are currently analyzing the "usual" MD indexing process using XSLT >> transformations to create SOLR documents. >> Considering the new integration achieved between GSearch 2.4 and Tika we're >> wondering about the opportunity to streamline the indexing process and move >> the MD indexing process on Tika. >> Tika already support DC documents through a DcXMLParser class and we're >> evaluating the opportunity to implement (Java) a set of custom parsers in >> order to support other MD schema (MODS, MIX, etc). >> What do you think about this approach? >> Is there anyone who has already thought about or started a similar >> development? >> >> All the best, >> Matteo > > -- > Conal Tuohy > eResearch Business Analyst > Victorian eResearch Strategic Initiative > +61-466324297 > > > ------------------------------------------------------------------------------ > Keep Your Developer Skills Current with LearnDevNow! > The most comprehensive online learning library for Microsoft developers > is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, > Metro Style Apps, more. Free future releases when you subscribe now! > http://p.sf.net/sfu/learndevnow-d2d > _______________________________________________ > Fedora-commons-users mailing list > Fedora-commons-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/fedora-commons-users ------------------------------------------------------------------------------ Keep Your Developer Skills Current with LearnDevNow! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-d2d _______________________________________________ Fedora-commons-users mailing list Fedora-commons-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/fedora-commons-users