Hi Judie,
These specific patterns can be captured without much difficulty using plain
regular expressions, but make sure that your sample data below is realistic.
Parsing much more complex article-style text can be difficult, so make sure you
input is as consistent as your sample below. If it is not, it might require
more sophisticated techniques to handle properly..
You can extract the info from your given sample with something like this:
xquery version "1.0-ml";
let $data :=
<data>
<text>7. Good</text>
<text>(a) better</text>
<text>12. Bad</text>
<text>(g) worse</text>
</data>for $text in $data/text
return
if (matches($text, "^\d+\.\s")) then
<info value={replace($text, "^(\d+)\.\s.*$", "$1")}>
<id>{replace($text, "^(\d+\.)\s.*$", "$1")}</id>
<text>{replace($text, "^\d+\.\s+(.*)$", "$1")}</text>
</info>
else if (matches($text, "^\([a-z]\)\s")) then
<info value={replace($text, "^\(([a-z])\)\s.*$", "$1")}>
<id>{replace($text, "^(\([a-z]\))\s.*$", "$1")}</id>
<text>{replace($text, "^\([a-z]\)\s+(.*)$",
"$1")}</text>
</info>
else
$text
Note that the above does not anticipate on child elements withing the text
element.
Kind regards,
Geert
>
Drs. G.P.H. Josten
Consultant
http://www.daidalos.nl/
Daidalos BV
Source of Innovation
Hoekeindsehof 1-4
2665 JZ Bleiswijk
Tel.: +31 (0) 10 850 1200
Fax: +31 (0) 10 850 1199
http://www.daidalos.nl/
KvK 27164984
De informatie - verzonden in of met dit emailbericht - is afkomstig van
Daidalos BV en is uitsluitend bestemd voor de geadresseerde. Indien u dit
bericht onbedoeld hebt ontvangen, verzoeken wij u het te verwijderen. Aan dit
bericht kunnen geen rechten worden ontleend.
> From: [email protected]
> [mailto:[email protected]] On Behalf Of
> judie pearline
> Sent: maandag 2 november 2009 17:34
> To: [email protected]
> Subject: [MarkLogic Dev General] Reg: Enriching the content
>
> Hi Team,
> We have the following input xml data.
>
> <data>
> <text>7. Good</text>
> <text>(a) better</text>
> <text>12. Bad</text>
> <text>(g) worse</text>
> </data>
>
> Please let us know how to achive below output from the above data:
>
> <data>
> <info value="7">
> <id>7.</id>
> <text>Good</text>
> </info>
> <info value="a">
> <id>(a)</id>
> <text>better</text>
> </info>
> <info value="12">
> <id>12.</id>
> <text>Bad</text>
> </info>
> <info value="g">
> <id>(g)</id>
> <text>worse</text>
> </info>
> </data>
>
>
> Thanks in Advance
>
> Regards,
> Judie
>
> ________________________________
>
> From cricket scores to your friends. Try the Yahoo! India
> Homepage!
> <http://in.rd.yahoo.com/tagline_metro_4/*http://in.yahoo.com/trynew>
>
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general