I am using GSearch 2.4 and "usual" approach to extract set of MODS
elements limited to those used in my repository. In my case Tika is
only used to get datastreams’ fulltext and embedded metadata.

MODS is "hierarchical" and I was able to do custom XSLT tweaking to
convert them to "flat" Solr document the way I needed. DC is easier in
this sense for generic processing because it is "flat".

Anyway it might be good idea to create parser for MODS, MIX.

Serhiy Polyakov
Texas Center for Digital Knowledge
University of North Texas




On Tue, Jan 17, 2012 at 6:47 PM, Conal Tuohy <conal.tu...@versi.edu.au> wrote:
> Hi Matteo
>
> For MODS, MIX, and other XML-based metadata schemas, I'd suggest XSLT is
> probably a more appropriate language than Java.
>
> Conal
>
>
> On 18/01/12 01:12, Matteo Bertazzo wrote:
>> Hi all,
>>    we are currently analyzing the "usual" MD indexing process using XSLT 
>> transformations to create SOLR documents.
>> Considering the new integration achieved between GSearch 2.4 and Tika we're 
>> wondering about the opportunity to streamline the indexing process and move 
>> the MD indexing process on Tika.
>> Tika already support DC documents through a DcXMLParser class and we're 
>> evaluating the opportunity to implement (Java) a set of custom parsers in 
>> order to support other MD schema (MODS, MIX, etc).
>> What do you think about this approach?
>> Is there anyone who has already thought about or started a similar 
>> development?
>>
>> All the best,
>> Matteo
>
> --
> Conal Tuohy
> eResearch Business Analyst
> Victorian eResearch Strategic Initiative
> +61-466324297
>
>
> ------------------------------------------------------------------------------
> Keep Your Developer Skills Current with LearnDevNow!
> The most comprehensive online learning library for Microsoft developers
> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
> Metro Style Apps, more. Free future releases when you subscribe now!
> http://p.sf.net/sfu/learndevnow-d2d
> _______________________________________________
> Fedora-commons-users mailing list
> Fedora-commons-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users

------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
Fedora-commons-users mailing list
Fedora-commons-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users

Reply via email to